2025-12-11 19:40:24 +00:00
12 changed files with 610 additions and 280 deletions
--- a/README.md
+++ b/README.md
@ -19,27 +19,73 @@ Perform the following operations, preferably in a Python-virtual environment.
 ## The commands
 This project provided the following CLI commands:
 - auto-curate: automatically move records from inboxes to the curated area of a collection
 - clean-incoming: delete all records from an inbox of a collection
 - list-incoming: list records in inboxes of a collection
 - post-records: read records from stdin and post them to inbox or curated area of a collection
 - read-pages: read records from collection, curated area of a collection, or specific inboxes
 - read-paginated-url: read records from any paginated service endpoints
 - build-local-triple-store: read all records from a collection and emit N-Triples
 The following section show the help message for those commands
 #### read-pages
 Read all pages from a paginated endpoint.
 ```
-usage: read_pages [-h] [-s SIZE] [-p PARAMETER] base_url collection
+usage: read-pages [-h] [-c CLASS_NAME] [-f FORMAT] [-p PID] [-i LABEL] [-C] [-m MATCHING] [-s PAGE_SIZE] [-F FIRST_PAGE] [-l LAST_PAGE] [--stats] [-P] service_url collection
 Get records from a collection on a dump-things-service
 This command lists records that are stored in a dump-things-service. By
 default all records that are readable with the given token, or the default
 token, will be displayed. The output format is JSONL (JSON lines), where
 every line contains a record or a record with paging information.  If `ttl`
 is chosen as format of the output records, the record content will be a string
 that contains a TTL-documents.
 The command supports to read from the curated area only, to read from incoming
 areas, or to read records with a given PID.
 Pagination information is returned for paginated results, when requested with
 `-P/--pagination`. All results are paginated except "get a record with a given PID"
 and "get the list of incoming zone labels".
 If the environment variable "DUMPTHINGS_TOKEN" is set, its content will be used
 as token to authenticate against the dump-things-service.
 positional arguments:
-  base_url
+  service_url
  collection
 options:
  -h, --help            show this help message and exit
-  -s, --size SIZE       default: 100
+  -c, --class CLASS_NAME
-  -p, --parameter       PARAMETER (key=value)
+                        only read records of this class, ignored if "--pid" is provided
-  -c, --class           limit to a particular class (name)
+  -f, --format FORMAT   format of the output records ("json" or "ttl")
  -p, --pid PID         the pid of the record that should be read
  -i, --incoming LABEL  read from incoming area with the given label in the collection, if LABEL is "-", return the labels
  -C, --curated         read from the curated area of the collection
  -m, --matching MATCHING
                        return only records that have a matching value (use {'option_strings': ['-m', '--matching'], 'dest': 'matching', 'nargs': None, 'const': None, 'default': None, 'type': None, 'choices': None,
                        'required': False, 'help': 'return only records that have a matching value (use % as wildcard). Ignored if "--pid" is provided. (NOTE: not all endpoints and backends support matching.)', 'metavar':
                        None, 'deprecated': False, 'container': <argparse._ArgumentGroup object at 0x7fab8219b610>, 'prog': 'read-pages'}s wildcard). Ignored if "--pid" is provided. (NOTE: not all endpoints and backends
                        support matching.)
  -s, --page-size PAGE_SIZE
                        set the page size (1 - 100) (default: 100), ignored if "--pid" is provided
  -F, --first-page FIRST_PAGE
                        the first page to return (default: 1), ignored if "--pid" is provided
  -l, --last-page LAST_PAGE
                        the last page to return (default: None (return all pages), ignored if "--pid" is provided
  --stats               show the number of records and pages and exit, ignored if "--pid" is provided
  -P, --pagination      show pagination information (each record from an paginated endpoint is returned as [<record>, <current page number>, <total number of pages>, <page size>, <total number of items>]
 ```
 For a given `<base_url>` and `<collection>` the tool will read all pages
-returned by `<base_url>/<collection>/records/p/`.
+returned by `<base_url>/<collection>/records/p/`, or the respective inbox or the curated area.
 The tool reads a token from the environment variable `DUMPTHINGS_TOKEN` if set. 
@ -73,10 +119,15 @@ The tool reads a token from the environment variable `DUMPTHINGS_TOKEN`.
 Move records from inboxes into the curated part of a collection.
 ```
-usage: auto_curate [-h] [--destination-base-url DEST_SERVICE_URL] [--destination-collection DEST_COLLECTION] [--destination-token DEST_TOKEN] [--exclude [EXCLUDE ...]] [--list-labels] [--list-only] [-p PID]
+usage: auto-curate [-h] [--destination-service-url DEST_SERVICE_URL] [--destination-collection DEST_COLLECTION] [--destination-token DEST_TOKEN] [-e EXCLUDE] [-l] [-r] [-o] [-p PID] SOURCE_SERVICE_URL SOURCE_COLLECTION
                   SOURCE_SERVICE_URL SOURCE_COLLECTION
-Automatically move records from the incoming areas of a collection to the curated area of the same collection, or to the incoming area of another collection.
+Automatically move records from the incoming areas of a
 collection to the curated area of the same collection, or to
 the curated area of another collection.
 The environment variable "DUMPTHINGS_TOKEN" must contain a token
 which used to authenticate the requests. The token must have
 curator-rights.
 positional arguments:
  SOURCE_SERVICE_URL
@ -84,21 +135,21 @@ positional arguments:
 options:
  -h, --help            show this help message and exit
-  --destination-base-url DEST_SERVICE_URL
+  --destination-service-url DEST_SERVICE_URL
                        select a different dump-thing-service, i.e. not SOURCE_SERVICE_URL, as destination for auto-curated records
  --destination-collection DEST_COLLECTION
                        select a different collection, i.e. not the SOURCE_COLLECTION of SOURCE_SERVICE_URL, as destination for auto-curated records
  --destination-token DEST_TOKEN
-                        if provided, this token will be used for the destination service, otherwise ${CURATOR_TOKEN} will be used
+                        if provided, this token will be used for the destination service, otherwise $DUMPTHINGS_TOKEN will be used
-  --exclude, -e [EXCLUDE ...]
+  -e, --exclude EXCLUDE
-                        exclude an inbox on the source collection
+                        exclude an inbox on the source collection (repeatable)
-  --list-labels, -l
+  -l, --list-labels     list the inbox labels of the given source collection, do not perform any curation
-  --list-only, -o
+  -r, --list-records    list records in the inboxes of the given source collection, do not perform any curation
-  -p, --pid PID         if provided, process only records that match the given PIDs. NOTE: matching does not involve CURIE-resolution!
+  -o, --list-only       [DEPRECATED: use "--list-records"] list records in the inboxes of the given source collection, do not perform any curation
  -p, --pid PID         if provided, process only records that match the given PIDs
 ```
-`auto-curate` requires that the environment variable `CURATOR_TOKEN` is set, and contains a valid curator-token.
+`auto-curate` requires that the environment variable DUMPTHINGS_TOKEN is set, and contains a valid curator-token.
 #### build-local-triple-store
@ -149,7 +200,7 @@ options:
 List the labels of all inboxes of a given collection
 ```
-usage: list-incoming [-h] [--show-records] base_url collection
+usage: list-incoming [-h] [-s] base_url collection
 positional arguments:
  base_url
@ -157,10 +208,10 @@ positional arguments:
 options:
  -h, --help          show this help message and exit
-  --show-records, -s  show the records in the inboxes as well
+  -s, --show-records  show the records in the inboxes as well
 ```
-`list-incoming` requires that the environment variable `CURATOR_TOKEN` is set, and contains a valid curator-token
+`list-incoming` requires that the environment variable `CURATOR_TOKEN` is set, and contains a valid curator-token.
 #### json2ttl
@ -171,8 +222,14 @@ contain TTL-documents with one string per line.
 ```
 usage: json2ttl [-h] schema
 Read JSON records from stdin and convert them to TTL
 This command reads one record per line, either JSON format or a JSON-string
 with a TTL-document from stdin, converts them to TTL or JSON and prints them
 to stdout.
 positional arguments:
-  schema
+  schema      URL of the schema that should be used
 options:
  -h, --help  show this help message and exit
@ -187,6 +244,44 @@ records in a collection to TTL:
 ...
 ```
 #### read-paginated-url
 General tool to read from any paginated endpoint of a dump-things-service
 ```
 usage: read-paginated-url [-h] [-s PAGE_SIZE] [-F FIRST_PAGE] [-l LAST_PAGE] [--stats] [-f FORMAT] [-m MATCHING] [-p] url
 Read paginated endpoint
 This command lists all records that are available via paginated endpoints from
 a dump-things-service, e.g., from:
  https://<service-location>/<collection>/records/p/
 If the environment variable "DUMPTHINGS_TOKEN" is set, its content will be used
 as token to authenticate against the dump-things-service.
 positional arguments:
  url                   url of the paginated endpoint of the dump-things-service
 options:
  -h, --help            show this help message and exit
  -s, --page-size PAGE_SIZE
                        set the page size (1 - 100) (default: 100)
  -F, --first-page FIRST_PAGE
                        the first page to return (default: 1)
  -l, --last-page LAST_PAGE
                        the last page to return (default: None (return all pages)
  --stats               show information about the number of records and pages and exit, the format is is returned as [<total number of pages>, <page size>, <total number of items>]
  -f, --format FORMAT   format of the output records ("json" or "ttl"). (NOTE: not all endpoints support the format parameter.)
  -m, --matching MATCHING
                        return only records that have a matching value (use % as wildcard). (NOTE: not all endpoints and backends support matching.)
  -p, --pagination      show pagination information (each record from an paginated endpoint is returned as [<record>, <current page number>, <total number of pages>, <page size>, <total number of items>]
 ```
 `read-paginated-url` reads a token from the environment variable `DUMPTHINGS_TOKEN` if it is set.
 ## SPARQL search over a collection with qlever
 The provide SPARQL search for a collection the following steps are necessary:
@ -194,7 +289,7 @@ The provide SPARQL search for a collection the following steps are necessary:
 1. Create N-Triple representation of the records of the store
 2. Build a qlever index
 3. Start the qlever server
-4. Use alever query to send SPARQL queries to the server
+4. Use qlever query to send SPARQL queries to the server
 ----
--- a/pyproject.toml
+++ b/pyproject.toml
@ -24,6 +24,7 @@ classifiers = [
    "Programming Language :: Python :: Implementation :: PyPy",
 ]
 dependencies = [
    "dump-things-pyclient",
    "dump-things-service",
    "progress",
    "qlever",
@ -44,6 +45,7 @@ list-incoming = "triple_tools.list_incoming:main"
 post-records = "triple_tools.post_records:main"
 read-pages = "triple_tools.read_pages:main"
 json2ttl = "triple_tools.json2ttl:main"
 read-paginated-url = "triple_tools.read_paginated_url:main"
 [tool.hatch.build.targets.wheel]
 exclude = [
--- a/triple_tools/about.py
+++ b/triple_tools/about.py
@ -1 +1 @@
-__version__ = '0.2.2'
+__version__ = '0.2.3'
--- a/triple_tools/auto_curate.py
+++ b/triple_tools/auto_curate.py
@ -1,33 +1,47 @@
 from __future__ import annotations
 import argparse
 import json
 import logging
 import os
 import re
 import sys
 from urllib.parse import quote_plus
-
+from dump_things_pyclient.communicate import (
-from triple_tools.communicate import (
+    HTTPError,
-    delete_url,
+    curated_write_record,
-    get_labels,
+    incoming_delete_record,
-    get_records_from_label,
+    incoming_read_labels,
-    post_to_url,
+    incoming_read_records,
 )
-def main():
+logger = logging.getLogger('auto_curate')
-    argument_parser = argparse.ArgumentParser(
+
-        prog='auto_curate',
+token_name = 'DUMPTHINGS_TOKEN'
-        description="""
+
 stl_info = False
 description=f"""
 Automatically move records from the incoming areas of a
 collection to the curated area of the same collection, or to
-            the incoming area of another collection.
+the curated area of another collection.
 The environment variable "{token_name}" must contain a token
 which used to authenticate the requests. The token must have
 curator-rights.
 """
 def _main():
    argument_parser = argparse.ArgumentParser(
        description=description,
        formatter_class=argparse.RawDescriptionHelpFormatter,
    )
-    argument_parser.add_argument('base_url', metavar='SOURCE_SERVICE_URL')
+    argument_parser.add_argument('service_url', metavar='SOURCE_SERVICE_URL')
    argument_parser.add_argument('collection', metavar='SOURCE_COLLECTION')
    argument_parser.add_argument(
-        '--destination-base-url',
+        '--destination-service-url',
        default=None,
        metavar='DEST_SERVICE_URL',
        help='select a different dump-thing-service, i.e. not SOURCE_SERVICE_URL, as destination for auto-curated records',
@ -42,71 +56,144 @@ def main():
        '--destination-token',
        default=None,
        metavar='DEST_TOKEN',
-        help='if provided, this token will be used for the destination service, otherwise ${CURATOR_TOKEN} will be used',
+        help=f'if provided, this token will be used for the destination service, otherwise ${token_name} will be used',
    )
    argument_parser.add_argument('--exclude', '-e', nargs='*', default=[], help='exclude an inbox on the source collection')
    argument_parser.add_argument('--list-labels', '-l', action='store_true')
    argument_parser.add_argument('--list-only', '-o', action='store_true')
    argument_parser.add_argument(
-        '-p', '--pid', action='append',
+        '-e', '--exclude',
-        help='if provided, process only records that match the given PIDs. NOTE: matching does not involve CURIE-resolution!',
+        action='append',
        default=[],
        help='exclude an inbox on the source collection (repeatable)',
    )
    argument_parser.add_argument(
        '-l', '--list-labels',
        action='store_true',
        help='list the inbox labels of the given source collection, do not perform any curation',
    )
    argument_parser.add_argument(
        '-r', '--list-records',
        action='store_true',
        help='list records in the inboxes of the given source collection, do not perform any curation',
    )
    argument_parser.add_argument(
        '-o', '--list-only',
        action='store_true',
        help='[DEPRECATED: use "--list-records"] list records in the inboxes of the given source collection, do not perform any curation',
    )
    argument_parser.add_argument(
        '-p', '--pid',
        action='append',
        help='if provided, process only records that match the given PIDs',
    )
    arguments = argument_parser.parse_args()
    print(arguments)
-    curator_token = os.environ.get('CURATOR_TOKEN')
+    curator_token = os.environ.get(token_name)
    if curator_token is None:
-        print('ERROR: CURATOR_TOKEN not set', file=sys.stderr, flush=True)
+        print(f'ERROR: environment variable "{token_name}" not set', file=sys.stderr, flush=True)
        return 1
-    destination_url = arguments.destination_base_url or arguments.base_url
+    destination_url = arguments.destination_service_url or arguments.service_url
    destination_collection = arguments.destination_collection or arguments.collection
    destination_token = arguments.destination_token or curator_token
-    for label in get_labels(
+    output = None
-        url_base=arguments.base_url,
+
-        collection=arguments.collection,
+    # If --list-labels and --list-records are provided, keep only the latter,
-        token=curator_token
+    # because it includes listing of labels
-    ):
+    if arguments.list_records:
        if arguments.list_labels:
-            print(label)
+            print('WARNING: `-l/--list-labels` and `-r/--list-records` defined, ignoring `-l/--list-labels`', file=sys.stderr, flush=True)
-            continue
+            arguments.list_labels = False
        output = {}
    if arguments.list_labels:
        output = []
    for label in incoming_read_labels(
                 service_url=arguments.service_url,
                 collection=arguments.collection,
                 token=curator_token):
        if label in arguments.exclude:
            logger.debug('ignoring excluded incoming label: %s', label)
            continue
-        for record in get_records_from_label(
+        if arguments.list_labels:
-            url_base=arguments.base_url,
+            output.append(label)
            continue
        if arguments.list_records:
            output[label] = []
        for record, _, _, _, _ in incoming_read_records(
                                  service_url=arguments.service_url,
                                  collection=arguments.collection,
                                  label=label,
-            token=curator_token
+                                  token=curator_token):
-        ):
+
            if arguments.pid:
                if record['pid'] not in arguments.pid:
                    logger.debug(
                        'ignoring record with non-matching pid: %s',
                        record['pid'])
                    continue
-            if arguments.list_only:
+            if arguments.list_records or arguments.list_only:
-                print(f'{label}:\t{record}')
+                output[label].append(record)
                continue
            # Get the class name from the `schema_type` attribute. This requires
            # that the schema type is either stored in the record or that the
            # store has a "Schema Type Layer", i.e., the store type is
            # `record_dir+stl`, or `sqlite+stl`.
            try:
                class_name = re.search('([_A-Za-z0-9]*$)', record['schema_type']).group(0)
-            # Store record in collection
+            except IndexError:
-            post_to_url(
+                global stl_info
-                f'{destination_url}/{destination_collection}/curated/record/{class_name}',
+                if not stl_info:
-                token=destination_token,
+                    print(
-                content=record,
+                        f"""Could not find `schema_type` attribute in record with
-            )
+                            pid {record['pid']}. Please ensure that `schema_type` is stored in
                            the records or that the associated incoming area store has a backend
                            with a "Schema Type Layer", i.e., "record_dir+stl" or
                            "sqlite+stl".""",
                        file=sys.stderr,
                        flush=True)
                    stl_info = True
                print(
                    f'WARNING: ignoring record with pid {record["pid"]}, `schema_type` attribute is missing.',
                    file=sys.stderr,
                    flush=True)
                continue
            # Store record in destination collection
            curated_write_record(
                service_url=destination_url,
                collection=destination_collection,
                class_name=class_name,
                record=record,
                token=destination_token)
            # Delete record from incoming area
-            url = f'{arguments.base_url}/{arguments.collection}/incoming/{label}/record?pid={quote_plus(record["pid"])}'
+            incoming_delete_record(
-            delete_url(
+                service_url=arguments.service_url,
-                url=url,
+                collection=arguments.collection,
                label=label,
                pid=record['pid'],
                token=curator_token,
            )
    if output is not None:
        print(json.dumps(output, ensure_ascii=False))
    return 0
 def main():
    try:
        return _main()
    except HTTPError as e:
        print(f'ERROR: {e}: {e.response.text}', file=sys.stderr, flush=True)
    return 1
 if __name__ == '__main__':
    sys.exit(main())
--- a/triple_tools/build_local_triple_store.py
+++ b/triple_tools/build_local_triple_store.py
@ -9,10 +9,13 @@ import sys
 from dump_things_service.converter import Format, FormatConverter
 from rdflib import Graph
-from triple_tools.communicate import get_all
+from dump_things_pyclient.communicate import (
    HTTPError,
    get_paginated,
 )
-def main():
+def _main():
    argument_parser = argparse.ArgumentParser()
    argument_parser.add_argument('schema')
    argument_parser.add_argument('base_url')
@ -22,8 +25,7 @@ def main():
    token = os.environ.get('DUMPTHINGS_TOKEN')
    if token is None:
-        print('WARNING: DUMPTHINGS_TOKEN not set', file=sys.stderr, flush=True)
+        print('WARNING: environment variable DUMPTHINGS_TOKEN not set', file=sys.stderr, flush=True)
    print(f'Creating converter for schema {arguments.schema} ...', file=sys.stderr, end='', flush=True)
    converter = FormatConverter(
@ -41,7 +43,7 @@ def main():
    )
    g = Graph()
-    for json_object in get_all(url_base, os.environ.get('DUMPTHINGS_TOKEN'), {'size': '100'}, show_progress=True):
+    for json_object in get_paginated(url_base, page_size=100, token=os.environ.get('DUMPTHINGS_TOKEN')):
        object_class = json_object.get('schema_type')
        if object_class is None:
            raise ValueError(f'No schema_type in {json_object}')
@ -51,7 +53,7 @@ def main():
        try:
            ttl = converter.convert(json_object, class_name)
        except ValueError as ve:
-            print(f'\nWARNING: could not convert record {json_object["pid"]}: {ve}', file=sys.stderr, flush=True)
+            print(f'WARNING: could not convert record {json_object["pid"]}: {ve}', file=sys.stderr, flush=True)
            continue
        g.parse(io.StringIO(ttl), format='n3')
@ -59,5 +61,13 @@ def main():
    return 0
 def main():
    try:
        return _main()
    except HTTPError as e:
        print(f'ERROR: {e}: {e.response.text}', file=sys.stderr, flush=True)
    return 1
 if __name__ == '__main__':
    sys.exit(main())
--- a/triple_tools/clean_incoming.py
+++ b/triple_tools/clean_incoming.py
@ -4,28 +4,29 @@ import argparse
 import os
 import sys
-from triple_tools.communicate import (
+from dump_things_pyclient.communicate import (
-    delete_url,
+    HTTPError,
-    get_records_from_label,
+    incoming_delete_record,
    incoming_read_records,
 )
-def main():
+def _main():
    argument_parser = argparse.ArgumentParser()
    argument_parser.add_argument('base_url')
    argument_parser.add_argument('collection')
    argument_parser.add_argument('label')
-    argument_parser.add_argument('--list-only', '-l', action='store_true')
+    argument_parser.add_argument('--list-only', '-l', action='store_true', help="list records in the inbox, don't remove them")
    arguments = argument_parser.parse_args()
    curator_token = os.environ.get('CURATOR_TOKEN')
    if curator_token is None:
-        print('ERROR: CURATOR_TOKEN not set', file=sys.stderr, flush=True)
+        print('ERROR: environment variable CURATOR_TOKEN not set', file=sys.stderr, flush=True)
        return 1
-    for record in get_records_from_label(
+    for record, _, _, _, _ in incoming_read_records(
-            url_base=arguments.base_url,
+            service_url=arguments.base_url,
            collection=arguments.collection,
            label=arguments.label,
            token=curator_token,
@ -35,13 +36,24 @@ def main():
            continue
        # Delete record from incoming area
-        label_url = f'{arguments.base_url}/{arguments.collection}/incoming/{arguments.label}'
+        incoming_delete_record(
-        delete_url(
+            service_url=arguments.base_url,
-            url = f'{label_url}/record?pid={record["pid"]}',
+            collection=arguments.collection,
            label=arguments.label,
            pid=record['pid'],
            token=curator_token,
        )
    return 0
 def main():
    try:
        return _main()
    except HTTPError as e:
        print(f'ERROR: {e}: {e.response.text}', file=sys.stderr, flush=True)
    return 1
 if __name__ == '__main__':
    sys.exit(main())
--- a/triple_tools/communicate.py
+++ b/triple_tools/communicate.py
@ -1,130 +0,0 @@
 from __future__ import annotations
 from collections.abc import Iterable
 from urllib.parse import quote_plus
 import requests
 from progress.bar import Bar
 def _create_url(
        url_base: str,
        parameters: dict[str, str] | None = None,
        page_number: int | None = None,
 ):
    parameters = parameters or {}
    parameters.update({'page': str(page_number)})
    all_parameters = [f'{k}={quote_plus(v)}' for k, v in parameters.items()]
    return url_base + '?' + '&'.join(all_parameters)
 def _get_page(
        url_base: str,
        token: str | None = None,
        parameters: Iterable[str] | None = None,
        page_number: int | None = None,
 ):
    return get_from_url(_create_url(url_base, parameters, page_number), token)
 def get_all(
        url_base: str,
        token: str | None = None,
        parameters: dict[str, str] | None = None,
        show_progress: bool = False,
 ):
    # Get the first result and the number of pages
    result = _get_page(url_base, token, parameters, page_number=1)
    total_pages = result['pages']
    if total_pages == 0:
        return
    if show_progress:
        bar = Bar('Pages', max=total_pages, suffix='%(index)d/%(max)d - %(eta_td)s')
        yield from result['items']
        bar.next()
    else:
        yield from result['items']
    # Get remaining results
    for page in range(2, total_pages + 1):
        result = _get_page(url_base, token, parameters, page_number=page)
        yield from result['items']
        if show_progress:
            bar.next()
    if show_progress:
        bar.finish()
 def check_result(
        result: requests.Response,
        method: str,
        url: str
 ):
    if not 200 <= result.status_code < 300:
        msg = f'HTTP {method} {url} failed: {result.status_code}: {result.text}'
        raise RuntimeError(msg)
 def get_from_url(
        url: str,
        token: str,
 ):
    r = requests.get(
        url,
        headers=({
            'x-dumpthings-token': token,
        } if token else {}),
    )
    check_result(r, 'GET', url)
    return r.json()
 def post_to_url(
        url: str,
        token: str | None,
        content: list | dict
 ):
    r = requests.post(
        url,
        headers=({
            'x-dumpthings-token': token,
        } if token else {}),
        json=content,
    )
    check_result(r, 'POST', url)
    return r.json()
 def delete_url(
        url: str,
        token: str | None,
 ):
    r = requests.delete(
        url,
        headers=({
             'x-dumpthings-token': token,
        } if token else {}),
    )
    check_result(r, 'DELETE', url)
    return r.json()
 def get_labels(
        url_base: str,
        collection: str,
        token: str | None = None,
 ):
    yield from get_from_url(f'{url_base}/{collection}/incoming/', token)
 def get_records_from_label(
        url_base: str,
        collection,
        label: str,
        token: str | None = None,
        parameters: dict[str, str] | None = None,
 ):
    label_url = f'{url_base}/{collection}/incoming/{label}/records/p/'
    yield from get_all(label_url, token=token, parameters=parameters)
--- a/triple_tools/json2ttl.py
+++ b/triple_tools/json2ttl.py
@ -11,9 +11,21 @@ from dump_things_service.converter import (
 )
 description = f"""Read JSON records from stdin and convert them to TTL
 This command reads one record per line, either JSON format or a JSON-string
 with a TTL-document from stdin, converts them to TTL or JSON and prints them
 to stdout.
 """
 def main():
-    argument_parser = argparse.ArgumentParser()
+    argument_parser = argparse.ArgumentParser(
-    argument_parser.add_argument('schema')
+        description=description,
        formatter_class=argparse.RawDescriptionHelpFormatter,
    )
    argument_parser.add_argument('schema', help='URL of the schema that should be used')
    arguments = argument_parser.parse_args()
@ -26,16 +38,16 @@ def main():
    print(' done', file=sys.stderr, flush=True)
    error = False
    for line in sys.stdin:
        json_object = json.loads(line)
        object_class = json_object.get('schema_type')
        if object_class is None:
            error = True
            print(f'ERROR: No schema_type in {json_object}', file=sys.stderr, flush=True)
            continue
        class_name = re.search('([_A-Za-z0-9]*$)', object_class).group(0)
        try:
            ttl = converter.convert(json_object, class_name)
        except ValueError as ve:
--- a/triple_tools/list_incoming.py
+++ b/triple_tools/list_incoming.py
@ -1,45 +1,60 @@
 from __future__ import annotations
 import argparse
 import json
 import os
 import sys
 from collections import defaultdict
-from triple_tools.communicate import (
+from dump_things_pyclient.communicate import (
-    get_labels,
+    HTTPError,
-    get_records_from_label,
+    incoming_read_labels,
    incoming_read_records,
 )
-def main():
+def _main():
    argument_parser = argparse.ArgumentParser()
    argument_parser.add_argument('base_url')
    argument_parser.add_argument('collection')
-    argument_parser.add_argument('--show-records', '-s', action='store_true')
+    argument_parser.add_argument('-s', '--show-records', action='store_true', help='show the records in the inboxes as well')
    arguments = argument_parser.parse_args()
    curator_token = os.environ.get('CURATOR_TOKEN')
    if curator_token is None:
-        print('ERROR: CURATOR_TOKEN not set', file=sys.stderr, flush=True)
+        print('ERROR: environment variable CURATOR_TOKEN not set', file=sys.stderr, flush=True)
        return 1
-    for label in get_labels(
+    result = {}
-            url_base=arguments.base_url,
+    for label in incoming_read_labels(
            service_url=arguments.base_url,
            collection=arguments.collection,
            token=curator_token,
    ):
-        print(label)
+        result[label] = []
        if arguments.show_records:
-            for record in get_records_from_label(
+            for record, _, _, _, _ in incoming_read_records(
-                    url_base=arguments.base_url,
+                    service_url=arguments.base_url,
                    collection=arguments.collection,
                    label=label,
                    token=curator_token,
            ):
-                print('\t', record)
+                result[label].append(record)
    if arguments.show_records is False:
        result = list(result)
    print(json.dumps(result, indent=2, ensure_ascii=False))
    return 0
 def main():
    try:
        return _main()
    except HTTPError as e:
        print(f'ERROR: {e}: {e.response.text}', file=sys.stderr, flush=True)
    return 1
 if __name__ == '__main__':
    sys.exit(main())
--- a/triple_tools/post_records.py
+++ b/triple_tools/post_records.py
@ -5,42 +5,51 @@ import json
 import os
 import sys
-from triple_tools.communicate import post_to_url
+from dump_things_pyclient.communicate import (
    collection_write_record,
    curated_write_record,
 )
 def main():
    argument_parser = argparse.ArgumentParser()
    argument_parser.add_argument('base_url')
    argument_parser.add_argument('collection')
-    argument_parser.add_argument('cls')
+    argument_parser.add_argument('cls', metavar='class')
-    argument_parser.add_argument('--curated', action='store_true')
+    argument_parser.add_argument('--curated', action='store_true', help='bypass inbox, requires curator token')
    arguments = argument_parser.parse_args()
    token = os.environ.get('DUMPTHINGS_TOKEN')
    if token is None:
-        print('WARNING: DUMPTHINGS_TOKEN not set', file=sys.stderr, flush=True)
+        print(
-
+            'WARNING: environment variable DUMPTHINGS_TOKEN not set',
-    url = (
+            file=sys.stderr,
-        arguments.base_url
+            flush=True,
        + ('' if arguments.base_url.endswith('/') else '/')
        + arguments.collection
        + '/'
        )
    if arguments.curated:
-        url += f'curated/'
+        write_record = curated_write_record
-    url += f'record/{arguments.cls}'
+    else:
        write_record = collection_write_record
    posted = False
    for line in sys.stdin:
-        rec = json.loads(line)
+        record = json.loads(line)
        try:
-            post_to_url(url, token, rec)
+            write_record(
                service_url=arguments.base_url,
                collection=arguments.collection,
                class_name=arguments.cls,
                record=record,
                token=token,
            )
        except Exception as e:
-            print(e)
+            print(f'Error: {e}', file=sys.stderr, flush=True)
        else:
            posted = True
            print('.', end='', flush=True)
    if posted:
        # final newline
        print('')
--- a/triple_tools/read_pages.py
+++ b/triple_tools/read_pages.py
@ -4,41 +4,172 @@ import argparse
 import json
 import os
 import sys
 from functools import partial
-from triple_tools.communicate import get_all
+from dump_things_pyclient.communicate import (
    HTTPError,
    collection_read_records,
    collection_read_records_of_class,
    collection_read_record_with_pid,
    curated_read_records,
    curated_read_records_of_class,
    curated_read_record_with_pid,
    incoming_read_labels,
    incoming_read_records,
    incoming_read_records_of_class,
    incoming_read_record_with_pid,
 )
 token_name = 'DUMPTHINGS_TOKEN'
 description = f"""Get records from a collection on a dump-things-service
 This command lists records that are stored in a dump-things-service. By
 default all records that are readable with the given token, or the default
 token, will be displayed. The output format is JSONL (JSON lines), where
 every line contains a record or a record with paging information.  If `ttl`
 is chosen as format of the output records, the record content will be a string
 that contains a TTL-documents.
 The command supports to read from the curated area only, to read from incoming
 areas, or to read records with a given PID.
 Pagination information is returned for paginated results, when requested with
 `-P/--pagination`. All results are paginated except "get a record with a given PID"
 and "get the list of incoming zone labels".
 If the environment variable "{token_name}" is set, its content will be used
 as token to authenticate against the dump-things-service.
 """
 def _main():
    argument_parser = argparse.ArgumentParser(
        description=description,
        formatter_class=argparse.RawDescriptionHelpFormatter,
    )
    argument_parser.add_argument('service_url')
    argument_parser.add_argument('collection')
    argument_parser.add_argument('-c', '--class', dest='class_name', help='only read records of this class, ignored if "--pid" is provided')
    argument_parser.add_argument('-f', '--format', help='format of the output records ("json" or "ttl")')
    argument_parser.add_argument('-p', '--pid', help='the pid of the record that should be read')
    argument_parser.add_argument('-i', '--incoming', metavar='LABEL', help='read from incoming area with the given label in the collection, if LABEL is "-", return the labels')
    argument_parser.add_argument('-C', '--curated', action='store_true', help='read from the curated area of the collection')
    argument_parser.add_argument('-m', '--matching', help='return only records that have a matching value (use % as wildcard). Ignored if "--pid" is provided. (NOTE: not all endpoints and backends support matching.)')
    argument_parser.add_argument('-s', '--page-size', type=int, help='set the page size (1 - 100) (default: 100), ignored if "--pid" is provided')
    argument_parser.add_argument('-F', '--first-page', type=int, help='the first page to return (default: 1), ignored if "--pid" is provided')
    argument_parser.add_argument('-l', '--last-page', type=int, default=None, help='the last page to return (default: None (return all pages), ignored if "--pid" is provided')
    argument_parser.add_argument('--stats', action='store_true', help='show the number of records and pages and exit, ignored if "--pid" is provided')
    argument_parser.add_argument('-P', '--pagination', action='store_true', help='show pagination information (each record from an paginated endpoint is returned as [<record>, <current page number>, <total number of pages>, <page size>, <total number of items>]')
    arguments = argument_parser.parse_args()
    if arguments.parameter:
        print(
            f'WARNING: option -p/--parameter is ignored, use existing options instead',
            file=sys.stderr,
            flush=True)
    token = os.environ.get(token_name)
    if token is None:
        print(f'WARNING: {token_name} not set', file=sys.stderr, flush=True)
    if arguments.incoming and arguments.curated:
        print(
            'ERROR: -i/--incoming and -c/--curated are mutually exclusive',
            file=sys.stderr,
            flush=True)
        return 1
    kwargs = dict(
        service_url=arguments.service_url,
        collection=arguments.collection,
        token=token,
    )
    if arguments.incoming == '-':
        result = incoming_read_labels(**kwargs)
        print('\n'.join(
            map(
                partial(json.dumps, ensure_ascii=False),
                result)))
        return 0
    elif arguments.pid:
        for argument_value, argument_name in (
                (arguments.matching, '-m/--matching'),
                (arguments.page_size, '-s/--page_size'),
                (arguments.first_page, '-F/--first_page'),
                (arguments.last_page, '-l/--last_page'),
                (arguments.stats, '--stats'),
                (arguments.class_name, '-c/--class'),
        ):
            if argument_value:
                print(
                    f'WARNING: {argument_name} ignored because "-p/--pid" is provided',
                    file=sys.stderr,
                    flush=True)
        kwargs['pid'] = arguments.pid
        if arguments.curated:
            result = curated_read_record_with_pid(**kwargs)
        elif arguments.incoming:
            kwargs['label'] = arguments.incoming
            result = incoming_read_record_with_pid(**kwargs)
        else:
            kwargs['format'] = arguments.format
            result = collection_read_record_with_pid(**kwargs)
        print(json.dumps(result, ensure_ascii=False))
        return 0
    elif arguments.class_name:
        kwargs.update(dict(
            class_name=arguments.class_name,
            matching=arguments.matching,
            page=arguments.first_page or 1,
            size=arguments.page_size or 100,
            last_page=arguments.last_page,
        ))
        if arguments.curated:
            result = curated_read_records_of_class(**kwargs)
        elif arguments.incoming:
            kwargs['label'] = arguments.incoming
            result = incoming_read_records_of_class(**kwargs)
        else:
            kwargs['format'] = arguments.format
            result = collection_read_records_of_class(**kwargs)
    else:
        kwargs.update(dict(
            matching=arguments.matching,
            page=arguments.first_page or 1,
            size=arguments.page_size or 100,
            last_page=arguments.last_page,
        ))
        if arguments.curated:
            result = curated_read_records(**kwargs)
        elif arguments.incoming:
            kwargs['label'] = arguments.incoming
            result = incoming_read_records(**kwargs)
        else:
            kwargs['format'] = arguments.format
            result = collection_read_records(**kwargs)
    if arguments.pagination:
        for record in result:
            print(json.dumps(record, ensure_ascii=False))
    else:
        for record in result:
            print(json.dumps(record[0], ensure_ascii=False))
    return 0
 def main():
-    argument_parser = argparse.ArgumentParser()
+    try:
-    argument_parser.add_argument('base_url')
+        return _main()
-    argument_parser.add_argument('collection')
+    except HTTPError as e:
-    argument_parser.add_argument('-s', '--size', type=int, default=100)
+        print(f'ERROR: {e}: {e.response.text}', file=sys.stderr, flush=True)
-    argument_parser.add_argument('-p', '--parameter', action='append', default=[])
+    return 1
    argument_parser.add_argument('-c', '--class', default=None, dest='cls')
    arguments = argument_parser.parse_args()
    token = os.environ.get('DUMPTHINGS_TOKEN')
    if token is None:
        print('WARNING: DUMPTHINGS_TOKEN not set', file=sys.stderr, flush=True)
    url_base = (
            arguments.base_url
            + ('' if arguments.base_url.endswith('/') else '/')
            + arguments.collection
            + f'/records/p/'
    )
    if arguments.cls:
        url_base += f'{arguments.cls}/'
    parameters = {'size': str(arguments.size)}
    parameters.update({
        param.split('=', 1)[0]: param.split('=', 1)[1]
        for param in (arguments.parameter or [])
    })
    for json_object in get_all(url_base, token, parameters=parameters):
        print(json.dumps(json_object))
 if __name__ == '__main__':
--- a/triple_tools/read_paginated_url.py
+++ b/triple_tools/read_paginated_url.py
@ -0,0 +1,87 @@
 from __future__ import annotations
 import argparse
 import json
 import os
 import sys
 from dump_things_pyclient.communicate import (
    HTTPError,
    get_paginated,
 )
 token_name = 'DUMPTHINGS_TOKEN'
 description = f"""Read paginated endpoint
 This command lists all records that are available via paginated endpoints from
 a dump-things-service, e.g., from:
  https://<service-location>/<collection>/records/p/
 If the environment variable "{token_name}" is set, its content will be used
 as token to authenticate against the dump-things-service.
 """
 def _main():
    argument_parser = argparse.ArgumentParser(
        description=description,
        formatter_class=argparse.RawDescriptionHelpFormatter,
    )
    argument_parser.add_argument('url', help='url of the paginated endpoint of the dump-things-service')
    argument_parser.add_argument('-s', '--page-size', type=int, default=100, help='set the page size (1 - 100) (default: 100)')
    argument_parser.add_argument('-F', '--first-page', type=int, default=1, help='the first page to return (default: 1)')
    argument_parser.add_argument('-l', '--last-page', type=int, default=None, help='the last page to return (default: None (return all pages)')
    argument_parser.add_argument('--stats', action='store_true', help='show information about  the number of records and pages and exit, the format is  is returned as [<total number of pages>, <page size>, <total number of items>]')
    argument_parser.add_argument('-f', '--format', help='format of the output records ("json" or "ttl"). (NOTE: not all endpoints support the format parameter.)')
    argument_parser.add_argument('-m', '--matching', help='return only records that have a matching value (use %% as wildcard). (NOTE: not all endpoints and backends support matching.)')
    argument_parser.add_argument('-p', '--pagination', action='store_true', help='show pagination information (each record from an paginated endpoint is returned as [<record>, <current page number>, <total number of pages>, <page size>, <total number of items>]')
    arguments = argument_parser.parse_args()
    token = os.environ.get(token_name)
    if token is None:
        print(f'WARNING: {token_name} not set', file=sys.stderr, flush=True)
    result = get_paginated(
        url=arguments.url,
        token=token,
        first_page=arguments.first_page,
        page_size=arguments.page_size,
        last_page=arguments.last_page,
        parameters={
            'format': arguments.format,
            **({'matching': arguments.matching}
               if arguments.matching is not None
               else {}
            ),
        }
    )
    if arguments.stats:
        record = next(result)
        print(json.dumps(record[2:], ensure_ascii=False))
        return 0
    if arguments.pagination:
        for record in result:
            print(json.dumps(record, ensure_ascii=False))
    else:
        for record in result:
            print(json.dumps(record[0], ensure_ascii=False))
    return 0
 def main():
    try:
        return _main()
    except HTTPError as e:
        print(f'ERROR: {e}: {e.response.text}', file=sys.stderr, flush=True)
    return 1
 if __name__ == '__main__':
    sys.exit(main())