- Python 100%
|
|
||
|---|---|---|
| .forgejo/workflows | ||
| dump_things_pyclient | ||
| .gitignore | ||
| CHANGELOG.md | ||
| pyproject.toml | ||
| README.md | ||
| uv.lock | ||
Dump Things Python Client
A simple client library and some CLI tools for dump-things-server in Python
The tools are in an early state and not automatically tested. Do not use them on collections with valuable data unless you have a backup, or are very brave, or quite reckless.
Tech Stack
-
Python >= 3.11
-
uv for dependency management
Installation
The tools are published as pypi-project dump-things-pyclient. Install it, e.g.,
via pip (preferably in a virtual environment):
pip install dump-things-pyclient
The commands
This project provides the CLI command dtc. dtc has a number of subcommands:
- auto-curate: automatically move records from inboxes to the curated area of a collection
- clean-incoming: delete all records from an inbox of a collection
- delete-records: delete records from an inbox or the curated area of a collection.
- export: export a collection to the file system
- get-records: get records from a dump-things collection
- import: import a collection from a file system dump (created by "export")
- list-incoming: list records in inboxes of a collection
- maintenance: activate or deactivate maintenance mode on a collection
- post-records: post records to an inbox or the curated area of a collection
- read-pages: read records from collection, curated area of a collection, or specific inboxes
- version: show the version of
dtc
Most commands require a token, all commands accept a token. Tokens are provided
to dtc with the option --token proceeding the subcommand.
This is the help message of dtc, which lists all available subcommands
Usage: dtc [OPTIONS] COMMAND [ARGS]...
╭─ Options ────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ --token TEXT provide a token on the command line, NOTE: on multiuser systems you should use the environment variable DTC_TOKEN instead │
│ --debug show debug output │
│ --help Show this message and exit. │
╰──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
╭─ Commands ───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ auto-curate Move records from inbox to curate area of a collection │
│ clean-incoming Remove records from an inbox of a dump-things collection │
│ delete-records Delete records from a dump-things collection │
│ export Export a collection to the file system │
│ get-records Get records from a dump-things collection │
│ import Import a collection from a file system │
│ list-incoming List inboxes of a dump-things collection │
│ maintenance Activate or deactivate maintenance mode on a collection │
│ post-records Post records to an inbox or the curated area of a dump-things collection │
│ read-pages Read records from paginated dump-things endpoints │
│ version Show the version of `dtc` │
╰──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
The following sections show the help message for those dtc-subcommands
auto-create
Move records from inbox to curate area of a collection
Usage: dtc auto-curate [OPTIONS] SERVICE_URL COLLECTION
Automatically move records from the incoming areas of the collection COLLECTION in the service SERVICE_URL to the curated area of the same collection, or to the curated area of another collection, possibly on another service.
A token is required and will be used to authenticate the requests. The token must have curator-rights.
╭─ Options ────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ --destination-service-url DEST_SERVICE_URL select a different dump-thing-service, i.e. not SERVICE_URL, as destination for auto-curated records (the default is SERVICE_URL) │
│ --destination-collection DEST_COLLECTION select a different collection, i.e. not COLLECTION, as destination for auto-curated records │
│ --destination-token DEST_TOKEN if provided, this token will be used the authenticate against DEST_SERVICE_URL, which defaults to SERVICE_URL (the default is the token provided via --token) │
│ --pid -p PID if provided, process only records that match the given PIDs. NOTE: matching does not involve CURIE-resolution │
│ --exclude -e TEXT exclude an inbox on the source collection (repeatable) │
│ --include -i TEXT process only the given inbox, all other inboxes are ignored (repeatable, -e/--exclude is applied after inclusion) │
│ --list-labels -l list the inbox labels of the given source collection, do not perform any curation │
│ --list-records -r list records in the inboxes of the given source collection, do not perform any curation │
│ --dry-run -d if provided, do not alter any data, instead print what would be done │
│ --help Show this message and exit. │
╰──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
clean-incoming
Usage: dtc clean-incoming [OPTIONS] SERVICE_URL COLLECTION INBOX_LABEL
Remove all records from an incoming areas of a collection on a dump-things-service
This command removes all records from the inbox with label INBOX_LABEL in the collection COLLECTION on the dump-things service given by SERVICE_URL.
A token with curator rights has to be provided.
╭─ Options ────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ --list-only -l only list records in the inbox, do not remove them │
│ --help Show this message and exit. │
╰──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
delete-records
Delete records from a collection on a dump-things-server
Usage: dtc delete-records [OPTIONS] SERVICE_URL COLLECTION PIDS
Delete records from a collection on a dump-things-service
This command delete the records given by PIDS from the collection COLLECTION of the dump-things service SERVICE_URL. If no pids are provided on the command line, the pid that should be deleted are read from stdin (one pid per
line, lines are stripped).
By default, the records will be deleted from the inbox associated with the token. If the option `-c/--curated` is given, the records are deleted from the curated area of the collection (this requires a token with curator
rights). If the option `-i/--incoming LABEL` is given, the records are deleted from the inbox specified by `LABEL` (this requires a token with curator rights).
╭─ Options ────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ --curated -c delete record from the curated area of the collection. (Note: requires a token with curator rights) │
│ --incoming -i LABEL delete from the collection's inbox with label LABEL, if LABEL is "-", return labels of all collection inboxes and exit │
│ --ignore-errors ignore errors when deleting a pid and continue with remaining pids │
│ --class -C CLASS delete ALL records of class CLASS from the collection's incoming area that is associated with the token. Can be combined with `-i/--incoming LABEL` or `-c/--curated` to delete all records of │
│ class CLASS from the incoming area `LABEL` or from the curated area. Note: if neither `-c/--curated` nor `-i/--incoming LABEL` is specified, the command cannot reliably determine which │
│ records are stored in the incoming area associated with a token and which records are stored in the curated area of the collection. This can lead to warnings about records that cannot be │
│ deleted. The command will print a list of all PIDs that could not be deleted. │
│ --json-error-messages if this flag is given, output information about failed delete operations to stdout. The format is JSONL (JSON lines), each JSON record contains the detailed error message, the PID of the │
│ record that could not be deleted. │
│ --help Show this message and exit. │
╰──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
export
Export the curated area and the inboxes of a collection to the file system.
Usage: dtc export [OPTIONS] SERVICE_URL COLLECTION DESTINATION_DIR
Export a collection to disk
This command exports all records that are stored in curated area and in the incoming areas of collection COLLECTION of the dump-things service SERVICE_URL.
Exported records are written to the directory DESTINATION_DIR. DESTINATION_DIR must not exist, `export` will create it.
A token with curator rights has to be provided.
╭─ Options ────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ --format -f [json|yaml] select output format for the exported records (default: json) │
│ --ignore-errors ignore records with missing `schema_type` instead of raising an error │
│ --keep-schema-type -k keep `schema_type`-attribute in records on file-system. By default the schema_type-attribute is removed because the class is encoded in the storage path of the records. │
│ --json-error-messages if this flag is given, output information about failed read or write operations to stdout. The format is JSONL (JSON lines), each JSON record contains the operation type (read, write), │
│ a detailed error message, and additional context dependent information, e.g., the PID of the record that could not be written to the file system. │
│ --help Show this message and exit. │
╰──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
get-records
Usage: dtc get-records [OPTIONS] SERVICE_URL COLLECTION
Get records from a collection on a dump-things-service
This command lists records that are stored in collection COLLECTION of the dump-things service SERVICE_URL. By default, all records that are readable with the given token, or the default token, will be displayed. The output
format is JSONL (JSON lines), where every line contains a record or a record with paging information. If `ttl` is chosen as format of the output records, the record content will be a string that contains a TTL-documents.
The command supports reading from the curated area only, reading from incoming areas, or reading a record with a given PID.
Pagination information is returned for paginated results, when requested with `-P/--pagination`. All results are paginated except "get a record with a given PID" and "get the list of incoming zone labels".
For reading from curated or incoming areas, a token with curator rights has to be provided.
╭─ Options ────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ --class -C TEXT only read records of this class, ignored if "--pid" is provided │
│ --format -f [json|ttl] request records in a specific format. (NOTE: not all endpoints support the "format"-parameter) │
│ --pid -p TEXT the pid of the record that should be read │
│ --incoming -i LABEL read from the collection's inbox with label LABEL, if LABEL is "-", print labels of all collection inboxes and exit │
│ --curated -c read from the curated area of the collection. (Note: requires a token with curator rights) │
│ --matching -m TEXT return only records that have a matching value (use % as wildcard). Ignored if "--pid" is provided. (Note: not all endpoints and backends support matching) │
│ --page-size -s INTEGER RANGE [1<=x<=100] set the page size (default: 100). (ignored if "--pid" is provided) │
│ --first-page -F INTEGER the first page to return (default: 1). (ignored if "--pid" is provided) │
│ --last-page -l INTEGER the last page to return, if not given, all pages will be returned. (ignored if "--pid" is provided) │
│ --stats show the number of records and pages and exit. (ignored if "--pid" is provided) │
│ --pagination -P show pagination information (each record from an paginated endpoint is returned as [<record>, <current page number>, <total number of pages>, <page size>, <total number of │
│ items>]. (ignored if "--pid" is provided) │
│ --help Show this message and exit. │
╰──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
import
Usage: dtc import [OPTIONS] SOURCE_DIR
Import a collection from disk
This command imports all records that are stored on disk in the directory SOURCE_DIR in the format that is created by `dtc export`. The records are stored in the dump-things service and the collection that are recorded in
`SOURCE_DIR/description.json`.
A token with curator rights has to be provided.
╭─ Options ────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ --service-url -s SERVICE_URL use the service SERVICE_URL instead of the service URL that is stored in `SOURCE_DIR/description.json` │
│ --collection -c COLLECTION use the collection name COLLECTION instead of the collection name that is stored in `SOURCE_DIR/description.json` │
│ --ignore-errors log errors an continue import instead of raising an exception │
│ --json-error-messages if this flag is given, output information about failed read or write operations to stdout. The format is JSONL (JSON lines), each JSON record contains the operation type (read, write), │
│ a detailed error message, and additional context dependent information, e.g., the PID of the record that could not be posted to the collection. │
│ --help Show this message and exit. │
╰──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
list-incoming
Usage: dtc list-incoming [OPTIONS] SERVICE_URL COLLECTION
List labels of incoming areas of a collection on a dump-things-service
This command lists the labels of the incoming areas of the collection COLLECTION on the dump-things service given by SERVICE_URL.
A token with curator rights has to be provided.
╭─ Options ────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ --show-records -s list records in inboxes │
│ --help Show this message and exit. │
╰──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
maintenance
Usage: dtc maintenance [OPTIONS] SERVICE_URL COLLECTION ACTIVE
Activate or deactivate maintenance mode on collection COLLECTION on the service SERVICE_URL. The argument ACTIVE should be either `On` or `Off` (case-insensitive).
A token with curator rights is required.
This command expects a server version >= 5.4.0
╭─ Options ────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ --help Show this message and exit. │
╰──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
post-records
Post records which are read from stdin in JSON lines format
Usage: dtc post-records [OPTIONS] SERVICE_URL COLLECTION CLASS
Read records of class CLASS from standard input and store them in the collection COLLECTION on the service SERVICE_URL. Records should be provided in JSON-lines format. Note: all records are assumed to be of class CLASS. To
submit records of multiple classes, the subcommand has to be invoked multiple times, once for each class.
If the `--curated`-option is provided, the records will be stored directly in the curated area of the collection without any alterations, i.e, no annotations will be added.
If no `--curated`-option is provided, the record will be stored in the inbox of the user that is associated with the token, and the record will be annotated with the submission time and the user that performed the submission.
A token is required and will be used to authenticate the requests. If the `--curated`-option is provided, the token must have curator-rights.
╭─ Options ────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ --curated store record directly in curated area instead of an inbox. (Note: requires a token with curator rights) │
│ --help Show this message and exit. │
╰──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
read-pages
Read all pages from a paginated endpoint.
Usage: dtc read-pages [OPTIONS] URL
Read paginated endpoint
This command lists all records that are available via a paginated endpoints from a dump-things-service, e.g., given by URL
https://<service-location>/<collection>/records/p/
╭─ Options ────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ --page-size -s INTEGER set the page size (1 - 100) (default: 100) │
│ --first-page -F INTEGER the first page to return (default: 1) │
│ --last-page -l INTEGER the last page to return (default: None (return all pages) │
│ --stats show information about the number of records and pages and exit, the format is is returned as [<total number of pages>, <page size>, <total number of items>] │
│ --format -f [json|ttl] request output records in a specific format. (NOTE: not all endpoints support the "format"-parameter) │
│ --matching -m TEXT return only records that have a matching value (use % as wildcard). (NOTE: not all endpoints and storage-backends support matching.) │
│ --pagination -P show pagination information (each record from an paginated endpoint is returned as [<record>, <current page number>, <total number of pages>, <page size>, <total number of items>] │
│ --help Show this message and exit. │
╰──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
version
Show the version of dtc.
Usage: dtc version [OPTIONS]
Show the version of `dtc` and exit
╭─ Options ────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ --help Show this message and exit. │
╰──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
Acknowledgements
This work was funded, in part, by:
-
Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) under grant TRR 379 (546006540, Q02 project)
-
MKW-NRW: Ministerium für Kultur und Wissenschaft des Landes Nordrhein-Westfalen under the Kooperationsplattformen 2022 program, grant number: KP22-106A