2025-12-11 19:40:24 +00:00
12 changed files with 610 additions and 280 deletions
--- a/README.md
+++ b/README.md
@ -19,27 +19,73 @@ Perform the following operations, preferably in a Python-virtual environment.

 ## The commands

+This project provided the following CLI commands:
+
+- auto-curate: automatically move records from inboxes to the curated area of a collection
+- clean-incoming: delete all records from an inbox of a collection
+- list-incoming: list records in inboxes of a collection
+- post-records: read records from stdin and post them to inbox or curated area of a collection
+- read-pages: read records from collection, curated area of a collection, or specific inboxes
+- read-paginated-url: read records from any paginated service endpoints
+- build-local-triple-store: read all records from a collection and emit N-Triples
+
+The following section show the help message for those commands

 #### read-pages

 Read all pages from a paginated endpoint.

 ```
-usage: read_pages [-h] [-s SIZE] [-p PARAMETER] base_url collection
+usage: read-pages [-h] [-c CLASS_NAME] [-f FORMAT] [-p PID] [-i LABEL] [-C] [-m MATCHING] [-s PAGE_SIZE] [-F FIRST_PAGE] [-l LAST_PAGE] [--stats] [-P] service_url collection
+
+Get records from a collection on a dump-things-service
+
+This command lists records that are stored in a dump-things-service. By
+default all records that are readable with the given token, or the default
+token, will be displayed. The output format is JSONL (JSON lines), where
+every line contains a record or a record with paging information.  If `ttl`
+is chosen as format of the output records, the record content will be a string
+that contains a TTL-documents.
+
+The command supports to read from the curated area only, to read from incoming
+areas, or to read records with a given PID.
+
+Pagination information is returned for paginated results, when requested with
+`-P/--pagination`. All results are paginated except "get a record with a given PID"
+ and "get the list of incoming zone labels".
+
+If the environment variable "DUMPTHINGS_TOKEN" is set, its content will be used
+as token to authenticate against the dump-things-service.

 positional arguments:
-  base_url
+  service_url
  collection

 options:
  -h, --help            show this help message and exit
-  -s, --size SIZE       default: 100
-  -p, --parameter       PARAMETER (key=value)
-  -c, --class           limit to a particular class (name)
+  -c, --class CLASS_NAME
+                        only read records of this class, ignored if "--pid" is provided
+  -f, --format FORMAT   format of the output records ("json" or "ttl")
+  -p, --pid PID         the pid of the record that should be read
+  -i, --incoming LABEL  read from incoming area with the given label in the collection, if LABEL is "-", return the labels
+  -C, --curated         read from the curated area of the collection
+  -m, --matching MATCHING
+                        return only records that have a matching value (use {'option_strings': ['-m', '--matching'], 'dest': 'matching', 'nargs': None, 'const': None, 'default': None, 'type': None, 'choices': None,
+                        'required': False, 'help': 'return only records that have a matching value (use % as wildcard). Ignored if "--pid" is provided. (NOTE: not all endpoints and backends support matching.)', 'metavar':
+                        None, 'deprecated': False, 'container': <argparse._ArgumentGroup object at 0x7fab8219b610>, 'prog': 'read-pages'}s wildcard). Ignored if "--pid" is provided. (NOTE: not all endpoints and backends
+                        support matching.)
+  -s, --page-size PAGE_SIZE
+                        set the page size (1 - 100) (default: 100), ignored if "--pid" is provided
+  -F, --first-page FIRST_PAGE
+                        the first page to return (default: 1), ignored if "--pid" is provided
+  -l, --last-page LAST_PAGE
+                        the last page to return (default: None (return all pages), ignored if "--pid" is provided
+  --stats               show the number of records and pages and exit, ignored if "--pid" is provided
+  -P, --pagination      show pagination information (each record from an paginated endpoint is returned as [<record>, <current page number>, <total number of pages>, <page size>, <total number of items>]
 ```

 For a given `<base_url>` and `<collection>` the tool will read all pages
-returned by `<base_url>/<collection>/records/p/`.
+returned by `<base_url>/<collection>/records/p/`, or the respective inbox or the curated area.

 The tool reads a token from the environment variable `DUMPTHINGS_TOKEN` if set. 

@ -73,10 +119,15 @@ The tool reads a token from the environment variable `DUMPTHINGS_TOKEN`.
 Move records from inboxes into the curated part of a collection.

 ```
-usage: auto_curate [-h] [--destination-base-url DEST_SERVICE_URL] [--destination-collection DEST_COLLECTION] [--destination-token DEST_TOKEN] [--exclude [EXCLUDE ...]] [--list-labels] [--list-only] [-p PID]
-                   SOURCE_SERVICE_URL SOURCE_COLLECTION
+usage: auto-curate [-h] [--destination-service-url DEST_SERVICE_URL] [--destination-collection DEST_COLLECTION] [--destination-token DEST_TOKEN] [-e EXCLUDE] [-l] [-r] [-o] [-p PID] SOURCE_SERVICE_URL SOURCE_COLLECTION

-Automatically move records from the incoming areas of a collection to the curated area of the same collection, or to the incoming area of another collection.
+Automatically move records from the incoming areas of a
+collection to the curated area of the same collection, or to
+the curated area of another collection.
+
+The environment variable "DUMPTHINGS_TOKEN" must contain a token
+which used to authenticate the requests. The token must have
+curator-rights.

 positional arguments:
  SOURCE_SERVICE_URL
@ -84,21 +135,21 @@ positional arguments:

 options:
  -h, --help            show this help message and exit
-  --destination-base-url DEST_SERVICE_URL
+  --destination-service-url DEST_SERVICE_URL
                        select a different dump-thing-service, i.e. not SOURCE_SERVICE_URL, as destination for auto-curated records
  --destination-collection DEST_COLLECTION
                        select a different collection, i.e. not the SOURCE_COLLECTION of SOURCE_SERVICE_URL, as destination for auto-curated records
  --destination-token DEST_TOKEN
-                        if provided, this token will be used for the destination service, otherwise ${CURATOR_TOKEN} will be used
-  --exclude, -e [EXCLUDE ...]
-                        exclude an inbox on the source collection
-  --list-labels, -l
-  --list-only, -o
-  -p, --pid PID         if provided, process only records that match the given PIDs. NOTE: matching does not involve CURIE-resolution!
+                        if provided, this token will be used for the destination service, otherwise $DUMPTHINGS_TOKEN will be used
+  -e, --exclude EXCLUDE
+                        exclude an inbox on the source collection (repeatable)
+  -l, --list-labels     list the inbox labels of the given source collection, do not perform any curation
+  -r, --list-records    list records in the inboxes of the given source collection, do not perform any curation
+  -o, --list-only       [DEPRECATED: use "--list-records"] list records in the inboxes of the given source collection, do not perform any curation
+  -p, --pid PID         if provided, process only records that match the given PIDs
 ```

-`auto-curate` requires that the environment variable `CURATOR_TOKEN` is set, and contains a valid curator-token.
-
+`auto-curate` requires that the environment variable DUMPTHINGS_TOKEN is set, and contains a valid curator-token.

 #### build-local-triple-store

@ -149,7 +200,7 @@ options:
 List the labels of all inboxes of a given collection

 ```
-usage: list-incoming [-h] [--show-records] base_url collection
+usage: list-incoming [-h] [-s] base_url collection

 positional arguments:
  base_url
@ -157,10 +208,10 @@ positional arguments:

 options:
  -h, --help          show this help message and exit
-  --show-records, -s  show the records in the inboxes as well
+  -s, --show-records  show the records in the inboxes as well
 ```

-`list-incoming` requires that the environment variable `CURATOR_TOKEN` is set, and contains a valid curator-token
+`list-incoming` requires that the environment variable `CURATOR_TOKEN` is set, and contains a valid curator-token.


 #### json2ttl
@ -171,8 +222,14 @@ contain TTL-documents with one string per line.
 ```
 usage: json2ttl [-h] schema

+Read JSON records from stdin and convert them to TTL
+
+This command reads one record per line, either JSON format or a JSON-string
+with a TTL-document from stdin, converts them to TTL or JSON and prints them
+to stdout.
+
 positional arguments:
-  schema
+  schema      URL of the schema that should be used

 options:
  -h, --help  show this help message and exit
@ -187,6 +244,44 @@ records in a collection to TTL:
 ...
 ```

+#### read-paginated-url
+
+General tool to read from any paginated endpoint of a dump-things-service
+
+```
+usage: read-paginated-url [-h] [-s PAGE_SIZE] [-F FIRST_PAGE] [-l LAST_PAGE] [--stats] [-f FORMAT] [-m MATCHING] [-p] url
+
+Read paginated endpoint
+
+This command lists all records that are available via paginated endpoints from
+a dump-things-service, e.g., from:
+  
+  https://<service-location>/<collection>/records/p/
+
+If the environment variable "DUMPTHINGS_TOKEN" is set, its content will be used
+as token to authenticate against the dump-things-service.
+
+positional arguments:
+  url                   url of the paginated endpoint of the dump-things-service
+
+options:
+  -h, --help            show this help message and exit
+  -s, --page-size PAGE_SIZE
+                        set the page size (1 - 100) (default: 100)
+  -F, --first-page FIRST_PAGE
+                        the first page to return (default: 1)
+  -l, --last-page LAST_PAGE
+                        the last page to return (default: None (return all pages)
+  --stats               show information about the number of records and pages and exit, the format is is returned as [<total number of pages>, <page size>, <total number of items>]
+  -f, --format FORMAT   format of the output records ("json" or "ttl"). (NOTE: not all endpoints support the format parameter.)
+  -m, --matching MATCHING
+                        return only records that have a matching value (use % as wildcard). (NOTE: not all endpoints and backends support matching.)
+  -p, --pagination      show pagination information (each record from an paginated endpoint is returned as [<record>, <current page number>, <total number of pages>, <page size>, <total number of items>]
+```
+
+`read-paginated-url` reads a token from the environment variable `DUMPTHINGS_TOKEN` if it is set.
+
+
 ## SPARQL search over a collection with qlever

 The provide SPARQL search for a collection the following steps are necessary:
@ -194,7 +289,7 @@ The provide SPARQL search for a collection the following steps are necessary:
 1. Create N-Triple representation of the records of the store
 2. Build a qlever index
 3. Start the qlever server
-4. Use alever query to send SPARQL queries to the server
+4. Use qlever query to send SPARQL queries to the server

 ----

--- a/pyproject.toml
+++ b/pyproject.toml
@ -24,6 +24,7 @@ classifiers = [
    "Programming Language :: Python :: Implementation :: PyPy",
 ]
 dependencies = [
+    "dump-things-pyclient",
    "dump-things-service",
    "progress",
    "qlever",
@ -44,6 +45,7 @@ list-incoming = "triple_tools.list_incoming:main"
 post-records = "triple_tools.post_records:main"
 read-pages = "triple_tools.read_pages:main"
 json2ttl = "triple_tools.json2ttl:main"
+read-paginated-url = "triple_tools.read_paginated_url:main"

 [tool.hatch.build.targets.wheel]
 exclude = [
--- a/triple_tools/about.py
+++ b/triple_tools/about.py
@ -1 +1 @@
-__version__ = '0.2.2'
+__version__ = '0.2.3'
--- a/triple_tools/auto_curate.py
+++ b/triple_tools/auto_curate.py
@ -1,33 +1,47 @@
 from __future__ import annotations

 import argparse
+import json
+import logging
 import os
 import re
 import sys
-from urllib.parse import quote_plus

-
-from triple_tools.communicate import (
-    delete_url,
-    get_labels,
-    get_records_from_label,
-    post_to_url,
+from dump_things_pyclient.communicate import (
+    HTTPError,
+    curated_write_record,
+    incoming_delete_record,
+    incoming_read_labels,
+    incoming_read_records,
 )


-def main():
-    argument_parser = argparse.ArgumentParser(
-        prog='auto_curate',
-        description="""
+logger = logging.getLogger('auto_curate')
+
+token_name = 'DUMPTHINGS_TOKEN'
+
+stl_info = False
+
+description=f"""
 Automatically move records from the incoming areas of a
 collection to the curated area of the same collection, or to
-            the incoming area of another collection.
+the curated area of another collection.
+
+The environment variable "{token_name}" must contain a token
+which used to authenticate the requests. The token must have
+curator-rights.
 """
+
+
+def _main():
+    argument_parser = argparse.ArgumentParser(
+        description=description,
+        formatter_class=argparse.RawDescriptionHelpFormatter,
    )
-    argument_parser.add_argument('base_url', metavar='SOURCE_SERVICE_URL')
+    argument_parser.add_argument('service_url', metavar='SOURCE_SERVICE_URL')
    argument_parser.add_argument('collection', metavar='SOURCE_COLLECTION')
    argument_parser.add_argument(
-        '--destination-base-url',
+        '--destination-service-url',
        default=None,
        metavar='DEST_SERVICE_URL',
        help='select a different dump-thing-service, i.e. not SOURCE_SERVICE_URL, as destination for auto-curated records',
@ -42,71 +56,144 @@ def main():
        '--destination-token',
        default=None,
        metavar='DEST_TOKEN',
-        help='if provided, this token will be used for the destination service, otherwise ${CURATOR_TOKEN} will be used',
+        help=f'if provided, this token will be used for the destination service, otherwise ${token_name} will be used',
    )
-    argument_parser.add_argument('--exclude', '-e', nargs='*', default=[], help='exclude an inbox on the source collection')
-    argument_parser.add_argument('--list-labels', '-l', action='store_true')
-    argument_parser.add_argument('--list-only', '-o', action='store_true')
    argument_parser.add_argument(
-        '-p', '--pid', action='append',
-        help='if provided, process only records that match the given PIDs. NOTE: matching does not involve CURIE-resolution!',
+        '-e', '--exclude',
+        action='append',
+        default=[],
+        help='exclude an inbox on the source collection (repeatable)',
+    )
+    argument_parser.add_argument(
+        '-l', '--list-labels',
+        action='store_true',
+        help='list the inbox labels of the given source collection, do not perform any curation',
+    )
+    argument_parser.add_argument(
+        '-r', '--list-records',
+        action='store_true',
+        help='list records in the inboxes of the given source collection, do not perform any curation',
+    )
+    argument_parser.add_argument(
+        '-o', '--list-only',
+        action='store_true',
+        help='[DEPRECATED: use "--list-records"] list records in the inboxes of the given source collection, do not perform any curation',
+    )
+    argument_parser.add_argument(
+        '-p', '--pid',
+        action='append',
+        help='if provided, process only records that match the given PIDs',
    )
-
    arguments = argument_parser.parse_args()
-    print(arguments)

-    curator_token = os.environ.get('CURATOR_TOKEN')
+    curator_token = os.environ.get(token_name)
    if curator_token is None:
-        print('ERROR: CURATOR_TOKEN not set', file=sys.stderr, flush=True)
+        print(f'ERROR: environment variable "{token_name}" not set', file=sys.stderr, flush=True)
        return 1

-    destination_url = arguments.destination_base_url or arguments.base_url
+    destination_url = arguments.destination_service_url or arguments.service_url
    destination_collection = arguments.destination_collection or arguments.collection
    destination_token = arguments.destination_token or curator_token

-    for label in get_labels(
-        url_base=arguments.base_url,
-        collection=arguments.collection,
-        token=curator_token
-    ):
+    output = None
+
+    # If --list-labels and --list-records are provided, keep only the latter,
+    # because it includes listing of labels
+    if arguments.list_records:
        if arguments.list_labels:
-            print(label)
-            continue
+            print('WARNING: `-l/--list-labels` and `-r/--list-records` defined, ignoring `-l/--list-labels`', file=sys.stderr, flush=True)
+            arguments.list_labels = False
+        output = {}
+    if arguments.list_labels:
+        output = []
+
+    for label in incoming_read_labels(
+                 service_url=arguments.service_url,
+                 collection=arguments.collection,
+                 token=curator_token):

        if label in arguments.exclude:
+            logger.debug('ignoring excluded incoming label: %s', label)
            continue

-        for record in get_records_from_label(
-            url_base=arguments.base_url,
+        if arguments.list_labels:
+            output.append(label)
+            continue
+
+        if arguments.list_records:
+            output[label] = []
+
+        for record, _, _, _, _ in incoming_read_records(
+                                  service_url=arguments.service_url,
                                  collection=arguments.collection,
                                  label=label,
-            token=curator_token
-        ):
+                                  token=curator_token):
+
            if arguments.pid:
                if record['pid'] not in arguments.pid:
+                    logger.debug(
+                        'ignoring record with non-matching pid: %s',
+                        record['pid'])
                    continue

-            if arguments.list_only:
-                print(f'{label}:\t{record}')
+            if arguments.list_records or arguments.list_only:
+                output[label].append(record)
                continue

+            # Get the class name from the `schema_type` attribute. This requires
+            # that the schema type is either stored in the record or that the
+            # store has a "Schema Type Layer", i.e., the store type is
+            # `record_dir+stl`, or `sqlite+stl`.
+            try:
                class_name = re.search('([_A-Za-z0-9]*$)', record['schema_type']).group(0)
-            # Store record in collection
-            post_to_url(
-                f'{destination_url}/{destination_collection}/curated/record/{class_name}',
-                token=destination_token,
-                content=record,
-            )
+            except IndexError:
+                global stl_info
+                if not stl_info:
+                    print(
+                        f"""Could not find `schema_type` attribute in record with
+                            pid {record['pid']}. Please ensure that `schema_type` is stored in
+                            the records or that the associated incoming area store has a backend
+                            with a "Schema Type Layer", i.e., "record_dir+stl" or
+                            "sqlite+stl".""",
+                        file=sys.stderr,
+                        flush=True)
+                    stl_info = True
+                print(
+                    f'WARNING: ignoring record with pid {record["pid"]}, `schema_type` attribute is missing.',
+                    file=sys.stderr,
+                    flush=True)
+                continue
+
+            # Store record in destination collection
+            curated_write_record(
+                service_url=destination_url,
+                collection=destination_collection,
+                class_name=class_name,
+                record=record,
+                token=destination_token)

            # Delete record from incoming area
-            url = f'{arguments.base_url}/{arguments.collection}/incoming/{label}/record?pid={quote_plus(record["pid"])}'
-            delete_url(
-                url=url,
+            incoming_delete_record(
+                service_url=arguments.service_url,
+                collection=arguments.collection,
+                label=label,
+                pid=record['pid'],
                token=curator_token,
            )

+    if output is not None:
+        print(json.dumps(output, ensure_ascii=False))
+
    return 0


+def main():
+    try:
+        return _main()
+    except HTTPError as e:
+        print(f'ERROR: {e}: {e.response.text}', file=sys.stderr, flush=True)
+    return 1
+
+
 if __name__ == '__main__':
    sys.exit(main())
--- a/triple_tools/build_local_triple_store.py
+++ b/triple_tools/build_local_triple_store.py
@ -9,10 +9,13 @@ import sys
 from dump_things_service.converter import Format, FormatConverter
 from rdflib import Graph

-from triple_tools.communicate import get_all
+from dump_things_pyclient.communicate import (
+    HTTPError,
+    get_paginated,
+)


-def main():
+def _main():
    argument_parser = argparse.ArgumentParser()
    argument_parser.add_argument('schema')
    argument_parser.add_argument('base_url')
@ -22,8 +25,7 @@ def main():

    token = os.environ.get('DUMPTHINGS_TOKEN')
    if token is None:
-        print('WARNING: DUMPTHINGS_TOKEN not set', file=sys.stderr, flush=True)
-
+        print('WARNING: environment variable DUMPTHINGS_TOKEN not set', file=sys.stderr, flush=True)

    print(f'Creating converter for schema {arguments.schema} ...', file=sys.stderr, end='', flush=True)
    converter = FormatConverter(
@ -41,7 +43,7 @@ def main():
    )

    g = Graph()
-    for json_object in get_all(url_base, os.environ.get('DUMPTHINGS_TOKEN'), {'size': '100'}, show_progress=True):
+    for json_object in get_paginated(url_base, page_size=100, token=os.environ.get('DUMPTHINGS_TOKEN')):
        object_class = json_object.get('schema_type')
        if object_class is None:
            raise ValueError(f'No schema_type in {json_object}')
@ -51,7 +53,7 @@ def main():
        try:
            ttl = converter.convert(json_object, class_name)
        except ValueError as ve:
-            print(f'\nWARNING: could not convert record {json_object["pid"]}: {ve}', file=sys.stderr, flush=True)
+            print(f'WARNING: could not convert record {json_object["pid"]}: {ve}', file=sys.stderr, flush=True)
            continue
        g.parse(io.StringIO(ttl), format='n3')

@ -59,5 +61,13 @@ def main():
    return 0


+def main():
+    try:
+        return _main()
+    except HTTPError as e:
+        print(f'ERROR: {e}: {e.response.text}', file=sys.stderr, flush=True)
+    return 1
+
+
 if __name__ == '__main__':
    sys.exit(main())
--- a/triple_tools/clean_incoming.py
+++ b/triple_tools/clean_incoming.py
@ -4,28 +4,29 @@ import argparse
 import os
 import sys

-from triple_tools.communicate import (
-    delete_url,
-    get_records_from_label,
+from dump_things_pyclient.communicate import (
+    HTTPError,
+    incoming_delete_record,
+    incoming_read_records,
 )


-def main():
+def _main():
    argument_parser = argparse.ArgumentParser()
    argument_parser.add_argument('base_url')
    argument_parser.add_argument('collection')
    argument_parser.add_argument('label')
-    argument_parser.add_argument('--list-only', '-l', action='store_true')
+    argument_parser.add_argument('--list-only', '-l', action='store_true', help="list records in the inbox, don't remove them")

    arguments = argument_parser.parse_args()

    curator_token = os.environ.get('CURATOR_TOKEN')
    if curator_token is None:
-        print('ERROR: CURATOR_TOKEN not set', file=sys.stderr, flush=True)
+        print('ERROR: environment variable CURATOR_TOKEN not set', file=sys.stderr, flush=True)
        return 1

-    for record in get_records_from_label(
-            url_base=arguments.base_url,
+    for record, _, _, _, _ in incoming_read_records(
+            service_url=arguments.base_url,
            collection=arguments.collection,
            label=arguments.label,
            token=curator_token,
@ -35,13 +36,24 @@ def main():
            continue

        # Delete record from incoming area
-        label_url = f'{arguments.base_url}/{arguments.collection}/incoming/{arguments.label}'
-        delete_url(
-            url = f'{label_url}/record?pid={record["pid"]}',
+        incoming_delete_record(
+            service_url=arguments.base_url,
+            collection=arguments.collection,
+            label=arguments.label,
+            pid=record['pid'],
            token=curator_token,
+
        )
    return 0


+def main():
+    try:
+        return _main()
+    except HTTPError as e:
+        print(f'ERROR: {e}: {e.response.text}', file=sys.stderr, flush=True)
+    return 1
+
+
 if __name__ == '__main__':
    sys.exit(main())
--- a/triple_tools/communicate.py
+++ b/triple_tools/communicate.py
@ -1,130 +0,0 @@
-from __future__ import annotations
-
-from collections.abc import Iterable
-from urllib.parse import quote_plus
-
-import requests
-from progress.bar import Bar
-
-
-def _create_url(
-        url_base: str,
-        parameters: dict[str, str] | None = None,
-        page_number: int | None = None,
-):
-    parameters = parameters or {}
-    parameters.update({'page': str(page_number)})
-    all_parameters = [f'{k}={quote_plus(v)}' for k, v in parameters.items()]
-    return url_base + '?' + '&'.join(all_parameters)
-
-
-def _get_page(
-        url_base: str,
-        token: str | None = None,
-        parameters: Iterable[str] | None = None,
-        page_number: int | None = None,
-):
-    return get_from_url(_create_url(url_base, parameters, page_number), token)
-
-
-def get_all(
-        url_base: str,
-        token: str | None = None,
-        parameters: dict[str, str] | None = None,
-        show_progress: bool = False,
-):
-    # Get the first result and the number of pages
-    result = _get_page(url_base, token, parameters, page_number=1)
-    total_pages = result['pages']
-    if total_pages == 0:
-        return
-
-    if show_progress:
-        bar = Bar('Pages', max=total_pages, suffix='%(index)d/%(max)d - %(eta_td)s')
-        yield from result['items']
-        bar.next()
-    else:
-        yield from result['items']
-
-    # Get remaining results
-    for page in range(2, total_pages + 1):
-        result = _get_page(url_base, token, parameters, page_number=page)
-        yield from result['items']
-        if show_progress:
-            bar.next()
-
-    if show_progress:
-        bar.finish()
-
-
-def check_result(
-        result: requests.Response,
-        method: str,
-        url: str
-):
-    if not 200 <= result.status_code < 300:
-        msg = f'HTTP {method} {url} failed: {result.status_code}: {result.text}'
-        raise RuntimeError(msg)
-
-
-def get_from_url(
-        url: str,
-        token: str,
-):
-    r = requests.get(
-        url,
-        headers=({
-            'x-dumpthings-token': token,
-        } if token else {}),
-    )
-    check_result(r, 'GET', url)
-    return r.json()
-
-
-def post_to_url(
-        url: str,
-        token: str | None,
-        content: list | dict
-):
-    r = requests.post(
-        url,
-        headers=({
-            'x-dumpthings-token': token,
-        } if token else {}),
-        json=content,
-    )
-    check_result(r, 'POST', url)
-    return r.json()
-
-
-def delete_url(
-        url: str,
-        token: str | None,
-):
-    r = requests.delete(
-        url,
-        headers=({
-             'x-dumpthings-token': token,
-        } if token else {}),
-    )
-    check_result(r, 'DELETE', url)
-    return r.json()
-
-
-def get_labels(
-        url_base: str,
-        collection: str,
-        token: str | None = None,
-):
-    yield from get_from_url(f'{url_base}/{collection}/incoming/', token)
-
-
-def get_records_from_label(
-        url_base: str,
-        collection,
-        label: str,
-        token: str | None = None,
-        parameters: dict[str, str] | None = None,
-):
-    label_url = f'{url_base}/{collection}/incoming/{label}/records/p/'
-    yield from get_all(label_url, token=token, parameters=parameters)
--- a/triple_tools/json2ttl.py
+++ b/triple_tools/json2ttl.py
@ -11,9 +11,21 @@ from dump_things_service.converter import (
 )


+description = f"""Read JSON records from stdin and convert them to TTL
+
+This command reads one record per line, either JSON format or a JSON-string
+with a TTL-document from stdin, converts them to TTL or JSON and prints them
+to stdout.
+
+"""
+
+
 def main():
-    argument_parser = argparse.ArgumentParser()
-    argument_parser.add_argument('schema')
+    argument_parser = argparse.ArgumentParser(
+        description=description,
+        formatter_class=argparse.RawDescriptionHelpFormatter,
+    )
+    argument_parser.add_argument('schema', help='URL of the schema that should be used')

    arguments = argument_parser.parse_args()

@ -26,16 +38,16 @@ def main():
    print(' done', file=sys.stderr, flush=True)

    error = False
-
    for line in sys.stdin:
        json_object = json.loads(line)

        object_class = json_object.get('schema_type')
        if object_class is None:
+            error = True
            print(f'ERROR: No schema_type in {json_object}', file=sys.stderr, flush=True)
            continue
-
        class_name = re.search('([_A-Za-z0-9]*$)', object_class).group(0)
+
        try:
            ttl = converter.convert(json_object, class_name)
        except ValueError as ve:
--- a/triple_tools/list_incoming.py
+++ b/triple_tools/list_incoming.py
@ -1,45 +1,60 @@
 from __future__ import annotations

 import argparse
+import json
 import os
 import sys
+from collections import defaultdict

-from triple_tools.communicate import (
-    get_labels,
-    get_records_from_label,
+from dump_things_pyclient.communicate import (
+    HTTPError,
+    incoming_read_labels,
+    incoming_read_records,
 )


-def main():
+def _main():
    argument_parser = argparse.ArgumentParser()
    argument_parser.add_argument('base_url')
    argument_parser.add_argument('collection')
-    argument_parser.add_argument('--show-records', '-s', action='store_true')
+    argument_parser.add_argument('-s', '--show-records', action='store_true', help='show the records in the inboxes as well')

    arguments = argument_parser.parse_args()

    curator_token = os.environ.get('CURATOR_TOKEN')
    if curator_token is None:
-        print('ERROR: CURATOR_TOKEN not set', file=sys.stderr, flush=True)
+        print('ERROR: environment variable CURATOR_TOKEN not set', file=sys.stderr, flush=True)
        return 1

-    for label in get_labels(
-            url_base=arguments.base_url,
+    result = {}
+    for label in incoming_read_labels(
+            service_url=arguments.base_url,
            collection=arguments.collection,
            token=curator_token,
    ):
-        print(label)
+        result[label] = []
        if arguments.show_records:
-            for record in get_records_from_label(
-                    url_base=arguments.base_url,
+            for record, _, _, _, _ in incoming_read_records(
+                    service_url=arguments.base_url,
                    collection=arguments.collection,
                    label=label,
                    token=curator_token,
            ):
-                print('\t', record)
+                result[label].append(record)

+    if arguments.show_records is False:
+        result = list(result)
+    print(json.dumps(result, indent=2, ensure_ascii=False))
    return 0


+def main():
+    try:
+        return _main()
+    except HTTPError as e:
+        print(f'ERROR: {e}: {e.response.text}', file=sys.stderr, flush=True)
+    return 1
+
+
 if __name__ == '__main__':
    sys.exit(main())
--- a/triple_tools/post_records.py
+++ b/triple_tools/post_records.py
@ -5,42 +5,51 @@ import json
 import os
 import sys

-from triple_tools.communicate import post_to_url
+from dump_things_pyclient.communicate import (
+    collection_write_record,
+    curated_write_record,
+)


 def main():
    argument_parser = argparse.ArgumentParser()
    argument_parser.add_argument('base_url')
    argument_parser.add_argument('collection')
-    argument_parser.add_argument('cls')
-    argument_parser.add_argument('--curated', action='store_true')
+    argument_parser.add_argument('cls', metavar='class')
+    argument_parser.add_argument('--curated', action='store_true', help='bypass inbox, requires curator token')

    arguments = argument_parser.parse_args()

    token = os.environ.get('DUMPTHINGS_TOKEN')
    if token is None:
-        print('WARNING: DUMPTHINGS_TOKEN not set', file=sys.stderr, flush=True)
-
-    url = (
-        arguments.base_url
-        + ('' if arguments.base_url.endswith('/') else '/')
-        + arguments.collection
-        + '/'
+        print(
+            'WARNING: environment variable DUMPTHINGS_TOKEN not set',
+            file=sys.stderr,
+            flush=True,
        )
+
    if arguments.curated:
-        url += f'curated/'
-    url += f'record/{arguments.cls}'
+        write_record = curated_write_record
+    else:
+        write_record = collection_write_record

    posted = False
    for line in sys.stdin:
-        rec = json.loads(line)
+        record = json.loads(line)
        try:
-            post_to_url(url, token, rec)
+            write_record(
+                service_url=arguments.base_url,
+                collection=arguments.collection,
+                class_name=arguments.cls,
+                record=record,
+                token=token,
+            )
        except Exception as e:
-            print(e)
+            print(f'Error: {e}', file=sys.stderr, flush=True)
        else:
            posted = True
            print('.', end='', flush=True)
+
    if posted:
        # final newline
        print('')
--- a/triple_tools/read_pages.py
+++ b/triple_tools/read_pages.py
@ -4,41 +4,172 @@ import argparse
 import json
 import os
 import sys
+from functools import partial

-from triple_tools.communicate import get_all
+from dump_things_pyclient.communicate import (
+    HTTPError,
+    collection_read_records,
+    collection_read_records_of_class,
+    collection_read_record_with_pid,
+    curated_read_records,
+    curated_read_records_of_class,
+    curated_read_record_with_pid,
+    incoming_read_labels,
+    incoming_read_records,
+    incoming_read_records_of_class,
+    incoming_read_record_with_pid,
+)
+
+
+token_name = 'DUMPTHINGS_TOKEN'
+
+description = f"""Get records from a collection on a dump-things-service
+
+This command lists records that are stored in a dump-things-service. By
+default all records that are readable with the given token, or the default
+token, will be displayed. The output format is JSONL (JSON lines), where
+every line contains a record or a record with paging information.  If `ttl`
+is chosen as format of the output records, the record content will be a string
+that contains a TTL-documents.
+
+The command supports to read from the curated area only, to read from incoming
+areas, or to read records with a given PID.
+
+Pagination information is returned for paginated results, when requested with
+`-P/--pagination`. All results are paginated except "get a record with a given PID"
+ and "get the list of incoming zone labels".
+
+If the environment variable "{token_name}" is set, its content will be used
+as token to authenticate against the dump-things-service.
+
+"""
+
+
+def _main():
+    argument_parser = argparse.ArgumentParser(
+        description=description,
+        formatter_class=argparse.RawDescriptionHelpFormatter,
+    )
+    argument_parser.add_argument('service_url')
+    argument_parser.add_argument('collection')
+    argument_parser.add_argument('-c', '--class', dest='class_name', help='only read records of this class, ignored if "--pid" is provided')
+    argument_parser.add_argument('-f', '--format', help='format of the output records ("json" or "ttl")')
+    argument_parser.add_argument('-p', '--pid', help='the pid of the record that should be read')
+    argument_parser.add_argument('-i', '--incoming', metavar='LABEL', help='read from incoming area with the given label in the collection, if LABEL is "-", return the labels')
+    argument_parser.add_argument('-C', '--curated', action='store_true', help='read from the curated area of the collection')
+    argument_parser.add_argument('-m', '--matching', help='return only records that have a matching value (use % as wildcard). Ignored if "--pid" is provided. (NOTE: not all endpoints and backends support matching.)')
+    argument_parser.add_argument('-s', '--page-size', type=int, help='set the page size (1 - 100) (default: 100), ignored if "--pid" is provided')
+    argument_parser.add_argument('-F', '--first-page', type=int, help='the first page to return (default: 1), ignored if "--pid" is provided')
+    argument_parser.add_argument('-l', '--last-page', type=int, default=None, help='the last page to return (default: None (return all pages), ignored if "--pid" is provided')
+    argument_parser.add_argument('--stats', action='store_true', help='show the number of records and pages and exit, ignored if "--pid" is provided')
+    argument_parser.add_argument('-P', '--pagination', action='store_true', help='show pagination information (each record from an paginated endpoint is returned as [<record>, <current page number>, <total number of pages>, <page size>, <total number of items>]')
+
+    arguments = argument_parser.parse_args()
+    if arguments.parameter:
+        print(
+            f'WARNING: option -p/--parameter is ignored, use existing options instead',
+            file=sys.stderr,
+            flush=True)
+
+    token = os.environ.get(token_name)
+    if token is None:
+        print(f'WARNING: {token_name} not set', file=sys.stderr, flush=True)
+
+    if arguments.incoming and arguments.curated:
+        print(
+            'ERROR: -i/--incoming and -c/--curated are mutually exclusive',
+            file=sys.stderr,
+            flush=True)
+        return 1
+
+    kwargs = dict(
+        service_url=arguments.service_url,
+        collection=arguments.collection,
+        token=token,
+    )
+
+    if arguments.incoming == '-':
+        result = incoming_read_labels(**kwargs)
+        print('\n'.join(
+            map(
+                partial(json.dumps, ensure_ascii=False),
+                result)))
+        return 0
+
+    elif arguments.pid:
+        for argument_value, argument_name in (
+                (arguments.matching, '-m/--matching'),
+                (arguments.page_size, '-s/--page_size'),
+                (arguments.first_page, '-F/--first_page'),
+                (arguments.last_page, '-l/--last_page'),
+                (arguments.stats, '--stats'),
+                (arguments.class_name, '-c/--class'),
+        ):
+            if argument_value:
+                print(
+                    f'WARNING: {argument_name} ignored because "-p/--pid" is provided',
+                    file=sys.stderr,
+                    flush=True)
+
+        kwargs['pid'] = arguments.pid
+        if arguments.curated:
+            result = curated_read_record_with_pid(**kwargs)
+        elif arguments.incoming:
+            kwargs['label'] = arguments.incoming
+            result = incoming_read_record_with_pid(**kwargs)
+        else:
+            kwargs['format'] = arguments.format
+            result = collection_read_record_with_pid(**kwargs)
+        print(json.dumps(result, ensure_ascii=False))
+        return 0
+
+    elif arguments.class_name:
+        kwargs.update(dict(
+            class_name=arguments.class_name,
+            matching=arguments.matching,
+            page=arguments.first_page or 1,
+            size=arguments.page_size or 100,
+            last_page=arguments.last_page,
+        ))
+        if arguments.curated:
+            result = curated_read_records_of_class(**kwargs)
+        elif arguments.incoming:
+            kwargs['label'] = arguments.incoming
+            result = incoming_read_records_of_class(**kwargs)
+        else:
+            kwargs['format'] = arguments.format
+            result = collection_read_records_of_class(**kwargs)
+    else:
+        kwargs.update(dict(
+            matching=arguments.matching,
+            page=arguments.first_page or 1,
+            size=arguments.page_size or 100,
+            last_page=arguments.last_page,
+        ))
+        if arguments.curated:
+            result = curated_read_records(**kwargs)
+        elif arguments.incoming:
+            kwargs['label'] = arguments.incoming
+            result = incoming_read_records(**kwargs)
+        else:
+            kwargs['format'] = arguments.format
+            result = collection_read_records(**kwargs)
+
+    if arguments.pagination:
+        for record in result:
+            print(json.dumps(record, ensure_ascii=False))
+    else:
+        for record in result:
+            print(json.dumps(record[0], ensure_ascii=False))
+    return 0


 def main():
-    argument_parser = argparse.ArgumentParser()
-    argument_parser.add_argument('base_url')
-    argument_parser.add_argument('collection')
-    argument_parser.add_argument('-s', '--size', type=int, default=100)
-    argument_parser.add_argument('-p', '--parameter', action='append', default=[])
-    argument_parser.add_argument('-c', '--class', default=None, dest='cls')
-
-    arguments = argument_parser.parse_args()
-
-    token = os.environ.get('DUMPTHINGS_TOKEN')
-    if token is None:
-        print('WARNING: DUMPTHINGS_TOKEN not set', file=sys.stderr, flush=True)
-
-    url_base = (
-            arguments.base_url
-            + ('' if arguments.base_url.endswith('/') else '/')
-            + arguments.collection
-            + f'/records/p/'
-    )
-    if arguments.cls:
-        url_base += f'{arguments.cls}/'
-
-    parameters = {'size': str(arguments.size)}
-    parameters.update({
-        param.split('=', 1)[0]: param.split('=', 1)[1]
-        for param in (arguments.parameter or [])
-    })
-
-    for json_object in get_all(url_base, token, parameters=parameters):
-        print(json.dumps(json_object))
+    try:
+        return _main()
+    except HTTPError as e:
+        print(f'ERROR: {e}: {e.response.text}', file=sys.stderr, flush=True)
+    return 1


 if __name__ == '__main__':
--- a/triple_tools/read_paginated_url.py
+++ b/triple_tools/read_paginated_url.py
@ -0,0 +1,87 @@
+from __future__ import annotations
+
+import argparse
+import json
+import os
+import sys
+
+from dump_things_pyclient.communicate import (
+    HTTPError,
+    get_paginated,
+)
+
+
+token_name = 'DUMPTHINGS_TOKEN'
+
+description = f"""Read paginated endpoint
+
+This command lists all records that are available via paginated endpoints from
+a dump-things-service, e.g., from:
+  
+  https://<service-location>/<collection>/records/p/
+
+If the environment variable "{token_name}" is set, its content will be used
+as token to authenticate against the dump-things-service.
+
+"""
+
+
+def _main():
+    argument_parser = argparse.ArgumentParser(
+        description=description,
+        formatter_class=argparse.RawDescriptionHelpFormatter,
+    )
+    argument_parser.add_argument('url', help='url of the paginated endpoint of the dump-things-service')
+    argument_parser.add_argument('-s', '--page-size', type=int, default=100, help='set the page size (1 - 100) (default: 100)')
+    argument_parser.add_argument('-F', '--first-page', type=int, default=1, help='the first page to return (default: 1)')
+    argument_parser.add_argument('-l', '--last-page', type=int, default=None, help='the last page to return (default: None (return all pages)')
+    argument_parser.add_argument('--stats', action='store_true', help='show information about  the number of records and pages and exit, the format is  is returned as [<total number of pages>, <page size>, <total number of items>]')
+    argument_parser.add_argument('-f', '--format', help='format of the output records ("json" or "ttl"). (NOTE: not all endpoints support the format parameter.)')
+    argument_parser.add_argument('-m', '--matching', help='return only records that have a matching value (use %% as wildcard). (NOTE: not all endpoints and backends support matching.)')
+    argument_parser.add_argument('-p', '--pagination', action='store_true', help='show pagination information (each record from an paginated endpoint is returned as [<record>, <current page number>, <total number of pages>, <page size>, <total number of items>]')
+
+    arguments = argument_parser.parse_args()
+
+    token = os.environ.get(token_name)
+    if token is None:
+        print(f'WARNING: {token_name} not set', file=sys.stderr, flush=True)
+
+    result = get_paginated(
+        url=arguments.url,
+        token=token,
+        first_page=arguments.first_page,
+        page_size=arguments.page_size,
+        last_page=arguments.last_page,
+        parameters={
+            'format': arguments.format,
+            **({'matching': arguments.matching}
+               if arguments.matching is not None
+               else {}
+            ),
+        }
+    )
+
+    if arguments.stats:
+        record = next(result)
+        print(json.dumps(record[2:], ensure_ascii=False))
+        return 0
+
+    if arguments.pagination:
+        for record in result:
+            print(json.dumps(record, ensure_ascii=False))
+    else:
+        for record in result:
+            print(json.dumps(record[0], ensure_ascii=False))
+    return 0
+
+
+def main():
+    try:
+        return _main()
+    except HTTPError as e:
+        print(f'ERROR: {e}: {e.response.text}', file=sys.stderr, flush=True)
+    return 1
+
+
+if __name__ == '__main__':
+    sys.exit(main())