Python-client library for dump-things-service
Find a file
Christian Monch f2468017f0
All checks were successful
Test execution / Test-all (push) Successful in 31s
update CHANGELOG.md
2026-03-31 17:18:12 +02:00
.forgejo/workflows create a virtual environment before running tests 2026-03-10 02:11:47 +01:00
dump_things_pyclient remove unnecessary warning from dtc get-records 2026-03-31 17:17:47 +02:00
.gitignore start dump-things-pyclient 2025-12-04 14:25:11 +01:00
CHANGELOG.md update CHANGELOG.md 2026-03-31 17:18:12 +02:00
pyproject.toml bump version, update CHANGELOG.md 2026-03-31 17:14:53 +02:00
README.md fix typo 2026-02-17 19:23:01 +01:00
uv.lock bump version, update CHANGELOG.md 2026-03-31 17:14:53 +02:00

Dump Things Python Client

A simple client library and some CLI tools for dump-things-server in Python

The tools are in an early state and not automatically tested. Do not use them on collections with valuable data unless you have a backup, or are very brave, or quite reckless.

Tech Stack

  • Python >= 3.11

  • uv for dependency management

Installation

The tools are published as pypi-project dump-things-pyclient. Install it, e.g., via pip (preferably in a virtual environment):

pip install dump-things-pyclient

The commands

This project provides the CLI command dtc. dtc has a number of subcommands:

  • auto-curate: automatically move records from inboxes to the curated area of a collection
  • clean-incoming: delete all records from an inbox of a collection
  • delete-records: delete records from an inbox or the curated area of a collection.
  • export: export a collection to the file system
  • get-records: get records from a dump-things collection
  • import: import a collection from a file system dump (created by "export")
  • list-incoming: list records in inboxes of a collection
  • maintenance: activate or deactivate maintenance mode on a collection
  • post-records: post records to an inbox or the curated area of a collection
  • read-pages: read records from collection, curated area of a collection, or specific inboxes
  • version: show the version of dtc

Most commands require a token, all commands accept a token. Tokens are provided to dtc with the option --token proceeding the subcommand.

This is the help message of dtc, which lists all available subcommands

 Usage: dtc [OPTIONS] COMMAND [ARGS]...                                                                                                                                                                                             
                                                                                                                                                                                                                                    
╭─ Options ────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ --token  TEXT  provide a token on the command line, NOTE: on multiuser systems you should use the environment variable DTC_TOKEN instead                                                                                         │
│ --debug        show debug output                                                                                                                                                                                                 │
│ --help         Show this message and exit.                                                                                                                                                                                       │
╰──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
╭─ Commands ───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ auto-curate                             Move records from inbox to curate area of a collection                                                                                                                                   │
│ clean-incoming                          Remove records from an inbox of a dump-things collection                                                                                                                                 │
│ delete-records                          Delete records from a dump-things collection                                                                                                                                             │
│ export                                  Export a collection to the file system                                                                                                                                                   │
│ get-records                             Get records from a dump-things collection                                                                                                                                                │
│ import                                  Import a collection from a file system                                                                                                                                                   │
│ list-incoming                           List inboxes of a dump-things collection                                                                                                                                                 │
│ maintenance                             Activate or deactivate maintenance mode on a collection                                                                                                                                  │
│ post-records                            Post records to an inbox or the curated area of a dump-things collection                                                                                                                 │
│ read-pages                              Read records from paginated dump-things endpoints                                                                                                                                        │
│ version                                 Show the version of `dtc`                                                                                                                                                                │
╰──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯

The following sections show the help message for those dtc-subcommands

auto-create

Move records from inbox to curate area of a collection

 Usage: dtc auto-curate [OPTIONS] SERVICE_URL COLLECTION                                                                                                                                                                            
                                                                                                                                                                                                                                    
 Automatically move records from the incoming areas of the collection COLLECTION in the service SERVICE_URL to the curated area of the same collection, or to the curated area of another collection, possibly on another service.  
 A token is required and will be used to authenticate the requests. The token must have curator-rights.                                                                                                                             
                                                                                                                                                                                                                                    
╭─ Options ────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ --destination-service-url      DEST_SERVICE_URL  select a different dump-thing-service, i.e. not SERVICE_URL, as destination for auto-curated records (the default is SERVICE_URL)                                               │
│ --destination-collection       DEST_COLLECTION   select a different collection, i.e. not COLLECTION, as destination for auto-curated records                                                                                     │
│ --destination-token            DEST_TOKEN        if provided, this token will be used the authenticate against DEST_SERVICE_URL, which defaults to SERVICE_URL (the default is the token provided via --token)                   │
│ --pid                      -p  PID               if provided, process only records that match the given PIDs. NOTE: matching does not involve CURIE-resolution                                                                   │
│ --exclude                  -e  TEXT              exclude an inbox on the source collection (repeatable)                                                                                                                          │
│ --include                  -i  TEXT              process only the given inbox, all other inboxes are ignored (repeatable, -e/--exclude is applied after inclusion)                                                               │
│ --list-labels              -l                    list the inbox labels of the given source collection, do not perform any curation                                                                                               │
│ --list-records             -r                    list records in the inboxes of the given source collection, do not perform any curation                                                                                         │
│ --dry-run                  -d                    if provided, do not alter any data, instead print what would be done                                                                                                            │
│ --help                                           Show this message and exit.                                                                                                                                                     │
╰──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯

clean-incoming

Usage: dtc clean-incoming [OPTIONS] SERVICE_URL COLLECTION INBOX_LABEL

Remove all records from an incoming areas of a collection on a dump-things-service
This command removes all records from the inbox with label INBOX_LABEL in the collection COLLECTION on the dump-things service given by SERVICE_URL.
A token with curator rights has to be provided.

╭─ Options ────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ --list-only  -l  only list records in the inbox, do not remove them                                                                                                                                                              │
│ --help           Show this message and exit.                                                                                                                                                                                     │
╰──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯

delete-records

Delete records from a collection on a dump-things-server

Usage: dtc delete-records [OPTIONS] SERVICE_URL COLLECTION PIDS

Delete records from a collection on a dump-things-service                                                                                                                                                                          
This command delete the records given by PIDS from the collection COLLECTION of the dump-things service SERVICE_URL. If no pids are provided on the command line, the pid that should be deleted are read from stdin (one pid per  
line, lines are stripped).                                                                                                                                                                                                         
By default, the records will be deleted from the inbox associated with the token. If the option `-c/--curated` is given, the records are deleted from the curated area of the collection (this requires a token with curator       
rights). If the option `-i/--incoming LABEL` is given, the records are deleted from the inbox specified by `LABEL` (this requires a token with curator rights).

╭─ Options ────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ --curated              -c         delete record from the curated area of the collection. (Note: requires a token with curator rights)                                                                                            │
│ --incoming             -i  LABEL  delete from the collection's inbox with label LABEL, if LABEL is "-", return labels of all collection inboxes and exit                                                                         │
│ --ignore-errors                   ignore errors when deleting a pid and continue with remaining pids                                                                                                                             │
│ --class                -C  CLASS  delete ALL records of class CLASS from the collection's incoming area that is associated with the token. Can be combined with `-i/--incoming LABEL` or `-c/--curated` to delete all records of │
│                                   class CLASS from the incoming area `LABEL` or from the curated area. Note: if neither `-c/--curated` nor `-i/--incoming LABEL` is specified, the command cannot reliably determine which       │
│                                   records are stored in the incoming area associated with a token and which records are stored in the curated area of the collection. This can lead to warnings about records that cannot be     │
│                                   deleted. The command will print a list of all PIDs that could not be deleted.                                                                                                                  │
│ --json-error-messages             if this flag is given, output information about failed delete operations to stdout. The format is JSONL (JSON lines), each JSON record contains the detailed error message, the PID of the     │
│                                   record that could not be deleted.                                                                                                                                                              │
│ --help                            Show this message and exit.                                                                                                                                                                    │
╰──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯

export

Export the curated area and the inboxes of a collection to the file system.

 Usage: dtc export [OPTIONS] SERVICE_URL COLLECTION DESTINATION_DIR                                                                                                                                                                 
                                                                                                                                                                                                                                    
 Export a collection to disk                                                                                                                                                                                                        
 This command exports all records that are stored in curated area and in the incoming areas of collection COLLECTION of the dump-things service SERVICE_URL.                                                                        
 Exported records are written to the directory DESTINATION_DIR. DESTINATION_DIR must not exist, `export` will create it.                                                                                                            
 A token with curator rights has to be provided.                                                                                                                                                                                    
                                                                                                                                                                                                                                    
╭─ Options ────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ --format               -f  [json|yaml]  select output format for the exported records (default: json)                                                                                                                            │
│ --ignore-errors                         ignore records with missing `schema_type` instead of raising an error                                                                                                                    │
│ --keep-schema-type     -k               keep `schema_type`-attribute in records on file-system. By default the schema_type-attribute is removed because the class is encoded in the storage path of the records.                 │
│ --json-error-messages                   if this flag is given, output information about failed read or write operations to stdout. The format is JSONL (JSON lines), each JSON record contains the operation type (read, write), │
│                                         a detailed error message, and additional context dependent information, e.g., the PID of the record that could not be written to the file system.                                        │
│ --help                                  Show this message and exit.                                                                                                                                                              │
╰──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯

get-records

 Usage: dtc get-records [OPTIONS] SERVICE_URL COLLECTION                                                                                                                                                                            
                                                                                                                                                                                                                                    
 Get records from a collection on a dump-things-service                                                                                                                                                                             
 This command lists records that are stored in collection COLLECTION of the dump-things service SERVICE_URL. By default, all records that are readable with the given token, or the default token, will be displayed. The output    
 format is JSONL (JSON lines), where every line contains a record or a record with paging information.  If `ttl` is chosen as format of the output records, the record content will be a string that contains a TTL-documents.      
 The command supports reading from the curated area only, reading from incoming areas, or reading a record with a given PID.                                                                                                        
 Pagination information is returned for paginated results, when requested with `-P/--pagination`. All results are paginated except "get a record with a given PID" and "get the list of incoming zone labels".                      
 For reading from curated or incoming areas, a token with curator rights has to be provided.                                                                                                                                        
                                                                                                                                                                                                                                    
╭─ Options ────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ --class       -C  TEXT                       only read records of this class, ignored if "--pid" is provided                                                                                                                     │
│ --format      -f  [json|ttl]                 request records in a specific format. (NOTE: not all endpoints support the "format"-parameter)                                                                                      │
│ --pid         -p  TEXT                       the pid of the record that should be read                                                                                                                                           │
│ --incoming    -i  LABEL                      read from the collection's inbox with label LABEL, if LABEL is "-", print labels of all collection inboxes and exit                                                                 │
│ --curated     -c                             read from the curated area of the collection. (Note: requires a token with curator rights)                                                                                          │
│ --matching    -m  TEXT                       return only records that have a matching value (use % as wildcard). Ignored if "--pid" is provided. (Note: not all endpoints and backends support matching)                         │
│ --page-size   -s  INTEGER RANGE [1<=x<=100]  set the page size (default: 100). (ignored if "--pid" is provided)                                                                                                                  │
│ --first-page  -F  INTEGER                    the first page to return (default: 1). (ignored if "--pid" is provided)                                                                                                             │
│ --last-page   -l  INTEGER                    the last page to return, if not given, all pages will be returned. (ignored if "--pid" is provided)                                                                                 │
│ --stats                                      show the number of records and pages and exit. (ignored if "--pid" is provided)                                                                                                     │
│ --pagination  -P                             show pagination information (each record from an paginated endpoint is returned as [<record>, <current page number>, <total number of pages>, <page size>, <total number of         │
│                                              items>]. (ignored if "--pid" is provided)                                                                                                                                           │
│ --help                                       Show this message and exit.                                                                                                                                                         │
╰──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯

import

 Usage: dtc import [OPTIONS] SOURCE_DIR                                                                                                                                                                                             
                                                                                                                                                                                                                                    
 Import a collection from disk                                                                                                                                                                                                      
 This command imports all records that are stored on disk in the directory SOURCE_DIR in the format that is created by `dtc export`. The records are stored in the dump-things service and the collection that are recorded in      
 `SOURCE_DIR/description.json`.                                                                                                                                                                                                     
 A token with curator rights has to be provided.                                                                                                                                                                                    
                                                                                                                                                                                                                                    
╭─ Options ────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ --service-url          -s  SERVICE_URL  use the service SERVICE_URL instead of the service URL that is stored in `SOURCE_DIR/description.json`                                                                                   │
│ --collection           -c  COLLECTION   use the collection name COLLECTION instead of the collection name that is stored in `SOURCE_DIR/description.json`                                                                        │
│ --ignore-errors                         log errors an continue import instead of raising an exception                                                                                                                            │
│ --json-error-messages                   if this flag is given, output information about failed read or write operations to stdout. The format is JSONL (JSON lines), each JSON record contains the operation type (read, write), │
│                                         a detailed error message, and additional context dependent information, e.g., the PID of the record that could not be posted to the collection.                                          │
│ --help                                  Show this message and exit.                                                                                                                                                              │
╰──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯

list-incoming

Usage: dtc list-incoming [OPTIONS] SERVICE_URL COLLECTION

List labels of incoming areas of a collection on a dump-things-service                                                                                                                                                             
This command lists the labels of the incoming areas of the collection COLLECTION on the dump-things service given by SERVICE_URL.                                                                                                  
A token with curator rights has to be provided.

╭─ Options ────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ --show-records  -s  list records in inboxes                                                                                                                                                                                      │
│ --help              Show this message and exit.                                                                                                                                                                                  │
╰──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯

maintenance

 Usage: dtc maintenance [OPTIONS] SERVICE_URL COLLECTION ACTIVE                                                                                                                                                                     
                                                                                                                                                                                                                                    
 Activate or deactivate maintenance mode on collection COLLECTION on the service SERVICE_URL. The argument ACTIVE should be either `On` or `Off` (case-insensitive).                                                                
 A token with curator rights is required.                                                                                                                                                                                           
 This command expects a server version >= 5.4.0                                                                                                                                                                                     
                                                                                                                                                                                                                                    
╭─ Options ────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ --help  Show this message and exit.                                                                                                                                                                                              │
╰──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯

post-records

Post records which are read from stdin in JSON lines format

 Usage: dtc post-records [OPTIONS] SERVICE_URL COLLECTION CLASS                                                                                                                                                                     
                                                                                                                                                                                                                                    
 Read records of class CLASS from standard input and store them in the collection COLLECTION on the service SERVICE_URL. Records should be provided in JSON-lines format. Note: all records are assumed to be of class CLASS. To    
 submit records of multiple classes, the subcommand has to be invoked multiple times, once for each class.                                                                                                                          
 If the `--curated`-option is provided, the records will be stored directly in the curated area of the collection without any alterations, i.e, no annotations will be added.                                                       
 If no `--curated`-option is provided, the record will be stored in the inbox of the user that is associated with the token, and the record will be annotated with the submission time and the user that performed the submission.  
 A token is required and will be used to authenticate the requests. If the `--curated`-option is provided, the token must have curator-rights.                                                                                      
                                                                                                                                                                                                                                    
╭─ Options ────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ --curated  store record directly in curated area instead of an inbox. (Note: requires a token with curator rights)                                                                                                               │
│ --help     Show this message and exit.                                                                                                                                                                                           │
╰──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯

read-pages

Read all pages from a paginated endpoint.

 Usage: dtc read-pages [OPTIONS] URL                                                                                                                                                                                                
                                                                                                                                                                                                                                    
 Read paginated endpoint                                                                                                                                                                                                            
 This command lists all records that are available via a paginated endpoints from a dump-things-service, e.g., given by URL                                                                                                         
 https://<service-location>/<collection>/records/p/                                                                                                                                                                                 
                                                                                                                                                                                                                                    
╭─ Options ────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ --page-size   -s  INTEGER     set the page size (1 - 100) (default: 100)                                                                                                                                                         │
│ --first-page  -F  INTEGER     the first page to return (default: 1)                                                                                                                                                              │
│ --last-page   -l  INTEGER     the last page to return (default: None (return all pages)                                                                                                                                          │
│ --stats                       show information about  the number of records and pages and exit, the format is  is returned as [<total number of pages>, <page size>, <total number of items>]                                    │
│ --format      -f  [json|ttl]  request output records in a specific format. (NOTE: not all endpoints support the "format"-parameter)                                                                                              │
│ --matching    -m  TEXT        return only records that have a matching value (use % as wildcard). (NOTE: not all endpoints and storage-backends support matching.)                                                               │
│ --pagination  -P              show pagination information (each record from an paginated endpoint is returned as [<record>, <current page number>, <total number of pages>, <page size>, <total number of items>]                │
│ --help                        Show this message and exit.                                                                                                                                                                        │
╰──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯

version

Show the version of dtc.

Usage: dtc version [OPTIONS]                                                                                                                                                                                                       
                                                                                                                                                                                                                                    
 Show the version of `dtc` and exit                                                                                                                                                                                                 
                                                                                                                                                                                                                                    
╭─ Options ────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ --help  Show this message and exit.                                                                                                                                                                                              │
╰──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯

Acknowledgements

This work was funded, in part, by:

  • Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) under grant TRR 379 (546006540, Q02 project)

  • MKW-NRW: Ministerium für Kultur und Wissenschaft des Landes Nordrhein-Westfalen under the Kooperationsplattformen 2022 program, grant number: KP22-106A