Use dump_things_pyclient to implement triple-tools #3
12 changed files with 610 additions and 280 deletions
145
README.md
145
README.md
|
|
@ -19,27 +19,73 @@ Perform the following operations, preferably in a Python-virtual environment.
|
||||||
|
|
||||||
## The commands
|
## The commands
|
||||||
|
|
||||||
|
This project provided the following CLI commands:
|
||||||
|
|
||||||
|
- auto-curate: automatically move records from inboxes to the curated area of a collection
|
||||||
|
- clean-incoming: delete all records from an inbox of a collection
|
||||||
|
- list-incoming: list records in inboxes of a collection
|
||||||
|
- post-records: read records from stdin and post them to inbox or curated area of a collection
|
||||||
|
- read-pages: read records from collection, curated area of a collection, or specific inboxes
|
||||||
|
- read-paginated-url: read records from any paginated service endpoints
|
||||||
|
- build-local-triple-store: read all records from a collection and emit N-Triples
|
||||||
|
|
||||||
|
The following section show the help message for those commands
|
||||||
|
|
||||||
#### read-pages
|
#### read-pages
|
||||||
|
|
||||||
Read all pages from a paginated endpoint.
|
Read all pages from a paginated endpoint.
|
||||||
|
|
||||||
```
|
```
|
||||||
usage: read_pages [-h] [-s SIZE] [-p PARAMETER] base_url collection
|
usage: read-pages [-h] [-c CLASS_NAME] [-f FORMAT] [-p PID] [-i LABEL] [-C] [-m MATCHING] [-s PAGE_SIZE] [-F FIRST_PAGE] [-l LAST_PAGE] [--stats] [-P] service_url collection
|
||||||
|
|
||||||
|
Get records from a collection on a dump-things-service
|
||||||
|
|
||||||
|
This command lists records that are stored in a dump-things-service. By
|
||||||
|
default all records that are readable with the given token, or the default
|
||||||
|
token, will be displayed. The output format is JSONL (JSON lines), where
|
||||||
|
every line contains a record or a record with paging information. If `ttl`
|
||||||
|
is chosen as format of the output records, the record content will be a string
|
||||||
|
that contains a TTL-documents.
|
||||||
|
|
||||||
|
The command supports to read from the curated area only, to read from incoming
|
||||||
|
areas, or to read records with a given PID.
|
||||||
|
|
||||||
|
Pagination information is returned for paginated results, when requested with
|
||||||
|
`-P/--pagination`. All results are paginated except "get a record with a given PID"
|
||||||
|
and "get the list of incoming zone labels".
|
||||||
|
|
||||||
|
If the environment variable "DUMPTHINGS_TOKEN" is set, its content will be used
|
||||||
|
as token to authenticate against the dump-things-service.
|
||||||
|
|
||||||
positional arguments:
|
positional arguments:
|
||||||
base_url
|
service_url
|
||||||
collection
|
collection
|
||||||
|
|
||||||
options:
|
options:
|
||||||
-h, --help show this help message and exit
|
-h, --help show this help message and exit
|
||||||
-s, --size SIZE default: 100
|
-c, --class CLASS_NAME
|
||||||
-p, --parameter PARAMETER (key=value)
|
only read records of this class, ignored if "--pid" is provided
|
||||||
-c, --class limit to a particular class (name)
|
-f, --format FORMAT format of the output records ("json" or "ttl")
|
||||||
|
-p, --pid PID the pid of the record that should be read
|
||||||
|
-i, --incoming LABEL read from incoming area with the given label in the collection, if LABEL is "-", return the labels
|
||||||
|
-C, --curated read from the curated area of the collection
|
||||||
|
-m, --matching MATCHING
|
||||||
|
return only records that have a matching value (use {'option_strings': ['-m', '--matching'], 'dest': 'matching', 'nargs': None, 'const': None, 'default': None, 'type': None, 'choices': None,
|
||||||
|
'required': False, 'help': 'return only records that have a matching value (use % as wildcard). Ignored if "--pid" is provided. (NOTE: not all endpoints and backends support matching.)', 'metavar':
|
||||||
|
None, 'deprecated': False, 'container': <argparse._ArgumentGroup object at 0x7fab8219b610>, 'prog': 'read-pages'}s wildcard). Ignored if "--pid" is provided. (NOTE: not all endpoints and backends
|
||||||
|
support matching.)
|
||||||
|
-s, --page-size PAGE_SIZE
|
||||||
|
set the page size (1 - 100) (default: 100), ignored if "--pid" is provided
|
||||||
|
-F, --first-page FIRST_PAGE
|
||||||
|
the first page to return (default: 1), ignored if "--pid" is provided
|
||||||
|
-l, --last-page LAST_PAGE
|
||||||
|
the last page to return (default: None (return all pages), ignored if "--pid" is provided
|
||||||
|
--stats show the number of records and pages and exit, ignored if "--pid" is provided
|
||||||
|
-P, --pagination show pagination information (each record from an paginated endpoint is returned as [<record>, <current page number>, <total number of pages>, <page size>, <total number of items>]
|
||||||
```
|
```
|
||||||
|
|
||||||
For a given `<base_url>` and `<collection>` the tool will read all pages
|
For a given `<base_url>` and `<collection>` the tool will read all pages
|
||||||
returned by `<base_url>/<collection>/records/p/`.
|
returned by `<base_url>/<collection>/records/p/`, or the respective inbox or the curated area.
|
||||||
|
|
||||||
The tool reads a token from the environment variable `DUMPTHINGS_TOKEN` if set.
|
The tool reads a token from the environment variable `DUMPTHINGS_TOKEN` if set.
|
||||||
|
|
||||||
|
|
@ -57,8 +103,8 @@ positional arguments:
|
||||||
class
|
class
|
||||||
|
|
||||||
options:
|
options:
|
||||||
-h, --help show this help message and exit
|
-h, --help show this help message and exit
|
||||||
--curated bypass inbox, requires curator token
|
--curated bypass inbox, requires curator token
|
||||||
```
|
```
|
||||||
|
|
||||||
For a given `<base_url>`, `<collection>`, and `<class>` the tool will
|
For a given `<base_url>`, `<collection>`, and `<class>` the tool will
|
||||||
|
|
@ -73,10 +119,15 @@ The tool reads a token from the environment variable `DUMPTHINGS_TOKEN`.
|
||||||
Move records from inboxes into the curated part of a collection.
|
Move records from inboxes into the curated part of a collection.
|
||||||
|
|
||||||
```
|
```
|
||||||
usage: auto_curate [-h] [--destination-base-url DEST_SERVICE_URL] [--destination-collection DEST_COLLECTION] [--destination-token DEST_TOKEN] [--exclude [EXCLUDE ...]] [--list-labels] [--list-only] [-p PID]
|
usage: auto-curate [-h] [--destination-service-url DEST_SERVICE_URL] [--destination-collection DEST_COLLECTION] [--destination-token DEST_TOKEN] [-e EXCLUDE] [-l] [-r] [-o] [-p PID] SOURCE_SERVICE_URL SOURCE_COLLECTION
|
||||||
SOURCE_SERVICE_URL SOURCE_COLLECTION
|
|
||||||
|
|
||||||
Automatically move records from the incoming areas of a collection to the curated area of the same collection, or to the incoming area of another collection.
|
Automatically move records from the incoming areas of a
|
||||||
|
collection to the curated area of the same collection, or to
|
||||||
|
the curated area of another collection.
|
||||||
|
|
||||||
|
The environment variable "DUMPTHINGS_TOKEN" must contain a token
|
||||||
|
which used to authenticate the requests. The token must have
|
||||||
|
curator-rights.
|
||||||
|
|
||||||
positional arguments:
|
positional arguments:
|
||||||
SOURCE_SERVICE_URL
|
SOURCE_SERVICE_URL
|
||||||
|
|
@ -84,21 +135,21 @@ positional arguments:
|
||||||
|
|
||||||
options:
|
options:
|
||||||
-h, --help show this help message and exit
|
-h, --help show this help message and exit
|
||||||
--destination-base-url DEST_SERVICE_URL
|
--destination-service-url DEST_SERVICE_URL
|
||||||
select a different dump-thing-service, i.e. not SOURCE_SERVICE_URL, as destination for auto-curated records
|
select a different dump-thing-service, i.e. not SOURCE_SERVICE_URL, as destination for auto-curated records
|
||||||
--destination-collection DEST_COLLECTION
|
--destination-collection DEST_COLLECTION
|
||||||
select a different collection, i.e. not the SOURCE_COLLECTION of SOURCE_SERVICE_URL, as destination for auto-curated records
|
select a different collection, i.e. not the SOURCE_COLLECTION of SOURCE_SERVICE_URL, as destination for auto-curated records
|
||||||
--destination-token DEST_TOKEN
|
--destination-token DEST_TOKEN
|
||||||
if provided, this token will be used for the destination service, otherwise ${CURATOR_TOKEN} will be used
|
if provided, this token will be used for the destination service, otherwise $DUMPTHINGS_TOKEN will be used
|
||||||
--exclude, -e [EXCLUDE ...]
|
-e, --exclude EXCLUDE
|
||||||
exclude an inbox on the source collection
|
exclude an inbox on the source collection (repeatable)
|
||||||
--list-labels, -l
|
-l, --list-labels list the inbox labels of the given source collection, do not perform any curation
|
||||||
--list-only, -o
|
-r, --list-records list records in the inboxes of the given source collection, do not perform any curation
|
||||||
-p, --pid PID if provided, process only records that match the given PIDs. NOTE: matching does not involve CURIE-resolution!
|
-o, --list-only [DEPRECATED: use "--list-records"] list records in the inboxes of the given source collection, do not perform any curation
|
||||||
|
-p, --pid PID if provided, process only records that match the given PIDs
|
||||||
```
|
```
|
||||||
|
|
||||||
`auto-curate` requires that the environment variable `CURATOR_TOKEN` is set, and contains a valid curator-token.
|
`auto-curate` requires that the environment variable DUMPTHINGS_TOKEN is set, and contains a valid curator-token.
|
||||||
|
|
||||||
|
|
||||||
#### build-local-triple-store
|
#### build-local-triple-store
|
||||||
|
|
||||||
|
|
@ -149,7 +200,7 @@ options:
|
||||||
List the labels of all inboxes of a given collection
|
List the labels of all inboxes of a given collection
|
||||||
|
|
||||||
```
|
```
|
||||||
usage: list-incoming [-h] [--show-records] base_url collection
|
usage: list-incoming [-h] [-s] base_url collection
|
||||||
|
|
||||||
positional arguments:
|
positional arguments:
|
||||||
base_url
|
base_url
|
||||||
|
|
@ -157,10 +208,10 @@ positional arguments:
|
||||||
|
|
||||||
options:
|
options:
|
||||||
-h, --help show this help message and exit
|
-h, --help show this help message and exit
|
||||||
--show-records, -s show the records in the inboxes as well
|
-s, --show-records show the records in the inboxes as well
|
||||||
```
|
```
|
||||||
|
|
||||||
`list-incoming` requires that the environment variable `CURATOR_TOKEN` is set, and contains a valid curator-token
|
`list-incoming` requires that the environment variable `CURATOR_TOKEN` is set, and contains a valid curator-token.
|
||||||
|
|
||||||
|
|
||||||
#### json2ttl
|
#### json2ttl
|
||||||
|
|
@ -171,8 +222,14 @@ contain TTL-documents with one string per line.
|
||||||
```
|
```
|
||||||
usage: json2ttl [-h] schema
|
usage: json2ttl [-h] schema
|
||||||
|
|
||||||
|
Read JSON records from stdin and convert them to TTL
|
||||||
|
|
||||||
|
This command reads one record per line, either JSON format or a JSON-string
|
||||||
|
with a TTL-document from stdin, converts them to TTL or JSON and prints them
|
||||||
|
to stdout.
|
||||||
|
|
||||||
positional arguments:
|
positional arguments:
|
||||||
schema
|
schema URL of the schema that should be used
|
||||||
|
|
||||||
options:
|
options:
|
||||||
-h, --help show this help message and exit
|
-h, --help show this help message and exit
|
||||||
|
|
@ -187,6 +244,44 @@ records in a collection to TTL:
|
||||||
...
|
...
|
||||||
```
|
```
|
||||||
|
|
||||||
|
#### read-paginated-url
|
||||||
|
|
||||||
|
General tool to read from any paginated endpoint of a dump-things-service
|
||||||
|
|
||||||
|
```
|
||||||
|
usage: read-paginated-url [-h] [-s PAGE_SIZE] [-F FIRST_PAGE] [-l LAST_PAGE] [--stats] [-f FORMAT] [-m MATCHING] [-p] url
|
||||||
|
|
||||||
|
Read paginated endpoint
|
||||||
|
|
||||||
|
This command lists all records that are available via paginated endpoints from
|
||||||
|
a dump-things-service, e.g., from:
|
||||||
|
|
||||||
|
https://<service-location>/<collection>/records/p/
|
||||||
|
|
||||||
|
If the environment variable "DUMPTHINGS_TOKEN" is set, its content will be used
|
||||||
|
as token to authenticate against the dump-things-service.
|
||||||
|
|
||||||
|
positional arguments:
|
||||||
|
url url of the paginated endpoint of the dump-things-service
|
||||||
|
|
||||||
|
options:
|
||||||
|
-h, --help show this help message and exit
|
||||||
|
-s, --page-size PAGE_SIZE
|
||||||
|
set the page size (1 - 100) (default: 100)
|
||||||
|
-F, --first-page FIRST_PAGE
|
||||||
|
the first page to return (default: 1)
|
||||||
|
-l, --last-page LAST_PAGE
|
||||||
|
the last page to return (default: None (return all pages)
|
||||||
|
--stats show information about the number of records and pages and exit, the format is is returned as [<total number of pages>, <page size>, <total number of items>]
|
||||||
|
-f, --format FORMAT format of the output records ("json" or "ttl"). (NOTE: not all endpoints support the format parameter.)
|
||||||
|
-m, --matching MATCHING
|
||||||
|
return only records that have a matching value (use % as wildcard). (NOTE: not all endpoints and backends support matching.)
|
||||||
|
-p, --pagination show pagination information (each record from an paginated endpoint is returned as [<record>, <current page number>, <total number of pages>, <page size>, <total number of items>]
|
||||||
|
```
|
||||||
|
|
||||||
|
`read-paginated-url` reads a token from the environment variable `DUMPTHINGS_TOKEN` if it is set.
|
||||||
|
|
||||||
|
|
||||||
## SPARQL search over a collection with qlever
|
## SPARQL search over a collection with qlever
|
||||||
|
|
||||||
The provide SPARQL search for a collection the following steps are necessary:
|
The provide SPARQL search for a collection the following steps are necessary:
|
||||||
|
|
@ -194,7 +289,7 @@ The provide SPARQL search for a collection the following steps are necessary:
|
||||||
1. Create N-Triple representation of the records of the store
|
1. Create N-Triple representation of the records of the store
|
||||||
2. Build a qlever index
|
2. Build a qlever index
|
||||||
3. Start the qlever server
|
3. Start the qlever server
|
||||||
4. Use alever query to send SPARQL queries to the server
|
4. Use qlever query to send SPARQL queries to the server
|
||||||
|
|
||||||
----
|
----
|
||||||
|
|
||||||
|
|
|
||||||
|
|
@ -24,6 +24,7 @@ classifiers = [
|
||||||
"Programming Language :: Python :: Implementation :: PyPy",
|
"Programming Language :: Python :: Implementation :: PyPy",
|
||||||
]
|
]
|
||||||
dependencies = [
|
dependencies = [
|
||||||
|
"dump-things-pyclient",
|
||||||
"dump-things-service",
|
"dump-things-service",
|
||||||
"progress",
|
"progress",
|
||||||
"qlever",
|
"qlever",
|
||||||
|
|
@ -44,6 +45,7 @@ list-incoming = "triple_tools.list_incoming:main"
|
||||||
post-records = "triple_tools.post_records:main"
|
post-records = "triple_tools.post_records:main"
|
||||||
read-pages = "triple_tools.read_pages:main"
|
read-pages = "triple_tools.read_pages:main"
|
||||||
json2ttl = "triple_tools.json2ttl:main"
|
json2ttl = "triple_tools.json2ttl:main"
|
||||||
|
read-paginated-url = "triple_tools.read_paginated_url:main"
|
||||||
|
|
||||||
[tool.hatch.build.targets.wheel]
|
[tool.hatch.build.targets.wheel]
|
||||||
exclude = [
|
exclude = [
|
||||||
|
|
|
||||||
|
|
@ -1 +1 @@
|
||||||
__version__ = '0.2.2'
|
__version__ = '0.2.3'
|
||||||
|
|
|
||||||
|
|
@ -1,33 +1,47 @@
|
||||||
from __future__ import annotations
|
from __future__ import annotations
|
||||||
|
|
||||||
import argparse
|
import argparse
|
||||||
|
import json
|
||||||
|
import logging
|
||||||
import os
|
import os
|
||||||
import re
|
import re
|
||||||
import sys
|
import sys
|
||||||
from urllib.parse import quote_plus
|
|
||||||
|
|
||||||
|
from dump_things_pyclient.communicate import (
|
||||||
from triple_tools.communicate import (
|
HTTPError,
|
||||||
delete_url,
|
curated_write_record,
|
||||||
get_labels,
|
incoming_delete_record,
|
||||||
get_records_from_label,
|
incoming_read_labels,
|
||||||
post_to_url,
|
incoming_read_records,
|
||||||
)
|
)
|
||||||
|
|
||||||
|
|
||||||
def main():
|
logger = logging.getLogger('auto_curate')
|
||||||
|
|
||||||
|
token_name = 'DUMPTHINGS_TOKEN'
|
||||||
|
|
||||||
|
stl_info = False
|
||||||
|
|
||||||
|
description=f"""
|
||||||
|
Automatically move records from the incoming areas of a
|
||||||
|
collection to the curated area of the same collection, or to
|
||||||
|
the curated area of another collection.
|
||||||
|
|
||||||
|
The environment variable "{token_name}" must contain a token
|
||||||
|
which used to authenticate the requests. The token must have
|
||||||
|
curator-rights.
|
||||||
|
"""
|
||||||
|
|
||||||
|
|
||||||
|
def _main():
|
||||||
argument_parser = argparse.ArgumentParser(
|
argument_parser = argparse.ArgumentParser(
|
||||||
prog='auto_curate',
|
description=description,
|
||||||
description="""
|
formatter_class=argparse.RawDescriptionHelpFormatter,
|
||||||
Automatically move records from the incoming areas of a
|
|
||||||
collection to the curated area of the same collection, or to
|
|
||||||
the incoming area of another collection.
|
|
||||||
"""
|
|
||||||
)
|
)
|
||||||
argument_parser.add_argument('base_url', metavar='SOURCE_SERVICE_URL')
|
argument_parser.add_argument('service_url', metavar='SOURCE_SERVICE_URL')
|
||||||
argument_parser.add_argument('collection', metavar='SOURCE_COLLECTION')
|
argument_parser.add_argument('collection', metavar='SOURCE_COLLECTION')
|
||||||
argument_parser.add_argument(
|
argument_parser.add_argument(
|
||||||
'--destination-base-url',
|
'--destination-service-url',
|
||||||
default=None,
|
default=None,
|
||||||
metavar='DEST_SERVICE_URL',
|
metavar='DEST_SERVICE_URL',
|
||||||
help='select a different dump-thing-service, i.e. not SOURCE_SERVICE_URL, as destination for auto-curated records',
|
help='select a different dump-thing-service, i.e. not SOURCE_SERVICE_URL, as destination for auto-curated records',
|
||||||
|
|
@ -42,71 +56,144 @@ def main():
|
||||||
'--destination-token',
|
'--destination-token',
|
||||||
default=None,
|
default=None,
|
||||||
metavar='DEST_TOKEN',
|
metavar='DEST_TOKEN',
|
||||||
help='if provided, this token will be used for the destination service, otherwise ${CURATOR_TOKEN} will be used',
|
help=f'if provided, this token will be used for the destination service, otherwise ${token_name} will be used',
|
||||||
)
|
)
|
||||||
argument_parser.add_argument('--exclude', '-e', nargs='*', default=[], help='exclude an inbox on the source collection')
|
|
||||||
argument_parser.add_argument('--list-labels', '-l', action='store_true')
|
|
||||||
argument_parser.add_argument('--list-only', '-o', action='store_true')
|
|
||||||
argument_parser.add_argument(
|
argument_parser.add_argument(
|
||||||
'-p', '--pid', action='append',
|
'-e', '--exclude',
|
||||||
help='if provided, process only records that match the given PIDs. NOTE: matching does not involve CURIE-resolution!',
|
action='append',
|
||||||
|
default=[],
|
||||||
|
help='exclude an inbox on the source collection (repeatable)',
|
||||||
|
)
|
||||||
|
argument_parser.add_argument(
|
||||||
|
'-l', '--list-labels',
|
||||||
|
action='store_true',
|
||||||
|
help='list the inbox labels of the given source collection, do not perform any curation',
|
||||||
|
)
|
||||||
|
argument_parser.add_argument(
|
||||||
|
'-r', '--list-records',
|
||||||
|
action='store_true',
|
||||||
|
help='list records in the inboxes of the given source collection, do not perform any curation',
|
||||||
|
)
|
||||||
|
argument_parser.add_argument(
|
||||||
|
'-o', '--list-only',
|
||||||
|
action='store_true',
|
||||||
|
help='[DEPRECATED: use "--list-records"] list records in the inboxes of the given source collection, do not perform any curation',
|
||||||
|
)
|
||||||
|
argument_parser.add_argument(
|
||||||
|
'-p', '--pid',
|
||||||
|
action='append',
|
||||||
|
help='if provided, process only records that match the given PIDs',
|
||||||
)
|
)
|
||||||
|
|
||||||
arguments = argument_parser.parse_args()
|
arguments = argument_parser.parse_args()
|
||||||
print(arguments)
|
|
||||||
|
|
||||||
curator_token = os.environ.get('CURATOR_TOKEN')
|
curator_token = os.environ.get(token_name)
|
||||||
if curator_token is None:
|
if curator_token is None:
|
||||||
print('ERROR: CURATOR_TOKEN not set', file=sys.stderr, flush=True)
|
print(f'ERROR: environment variable "{token_name}" not set', file=sys.stderr, flush=True)
|
||||||
return 1
|
return 1
|
||||||
|
|
||||||
destination_url = arguments.destination_base_url or arguments.base_url
|
destination_url = arguments.destination_service_url or arguments.service_url
|
||||||
destination_collection = arguments.destination_collection or arguments.collection
|
destination_collection = arguments.destination_collection or arguments.collection
|
||||||
destination_token = arguments.destination_token or curator_token
|
destination_token = arguments.destination_token or curator_token
|
||||||
|
|
||||||
for label in get_labels(
|
output = None
|
||||||
url_base=arguments.base_url,
|
|
||||||
collection=arguments.collection,
|
# If --list-labels and --list-records are provided, keep only the latter,
|
||||||
token=curator_token
|
# because it includes listing of labels
|
||||||
):
|
if arguments.list_records:
|
||||||
if arguments.list_labels:
|
if arguments.list_labels:
|
||||||
print(label)
|
print('WARNING: `-l/--list-labels` and `-r/--list-records` defined, ignoring `-l/--list-labels`', file=sys.stderr, flush=True)
|
||||||
continue
|
arguments.list_labels = False
|
||||||
|
output = {}
|
||||||
|
if arguments.list_labels:
|
||||||
|
output = []
|
||||||
|
|
||||||
|
for label in incoming_read_labels(
|
||||||
|
service_url=arguments.service_url,
|
||||||
|
collection=arguments.collection,
|
||||||
|
token=curator_token):
|
||||||
|
|
||||||
if label in arguments.exclude:
|
if label in arguments.exclude:
|
||||||
|
logger.debug('ignoring excluded incoming label: %s', label)
|
||||||
continue
|
continue
|
||||||
|
|
||||||
for record in get_records_from_label(
|
if arguments.list_labels:
|
||||||
url_base=arguments.base_url,
|
output.append(label)
|
||||||
collection=arguments.collection,
|
continue
|
||||||
label=label,
|
|
||||||
token=curator_token
|
if arguments.list_records:
|
||||||
):
|
output[label] = []
|
||||||
|
|
||||||
|
for record, _, _, _, _ in incoming_read_records(
|
||||||
|
service_url=arguments.service_url,
|
||||||
|
collection=arguments.collection,
|
||||||
|
label=label,
|
||||||
|
token=curator_token):
|
||||||
|
|
||||||
if arguments.pid:
|
if arguments.pid:
|
||||||
if record['pid'] not in arguments.pid:
|
if record['pid'] not in arguments.pid:
|
||||||
|
logger.debug(
|
||||||
|
'ignoring record with non-matching pid: %s',
|
||||||
|
record['pid'])
|
||||||
continue
|
continue
|
||||||
|
|
||||||
if arguments.list_only:
|
if arguments.list_records or arguments.list_only:
|
||||||
print(f'{label}:\t{record}')
|
output[label].append(record)
|
||||||
continue
|
continue
|
||||||
|
|
||||||
class_name = re.search('([_A-Za-z0-9]*$)', record['schema_type']).group(0)
|
# Get the class name from the `schema_type` attribute. This requires
|
||||||
# Store record in collection
|
# that the schema type is either stored in the record or that the
|
||||||
post_to_url(
|
# store has a "Schema Type Layer", i.e., the store type is
|
||||||
f'{destination_url}/{destination_collection}/curated/record/{class_name}',
|
# `record_dir+stl`, or `sqlite+stl`.
|
||||||
token=destination_token,
|
try:
|
||||||
content=record,
|
class_name = re.search('([_A-Za-z0-9]*$)', record['schema_type']).group(0)
|
||||||
)
|
except IndexError:
|
||||||
|
global stl_info
|
||||||
|
if not stl_info:
|
||||||
|
print(
|
||||||
|
f"""Could not find `schema_type` attribute in record with
|
||||||
|
pid {record['pid']}. Please ensure that `schema_type` is stored in
|
||||||
|
the records or that the associated incoming area store has a backend
|
||||||
|
with a "Schema Type Layer", i.e., "record_dir+stl" or
|
||||||
|
"sqlite+stl".""",
|
||||||
|
file=sys.stderr,
|
||||||
|
flush=True)
|
||||||
|
stl_info = True
|
||||||
|
print(
|
||||||
|
f'WARNING: ignoring record with pid {record["pid"]}, `schema_type` attribute is missing.',
|
||||||
|
file=sys.stderr,
|
||||||
|
flush=True)
|
||||||
|
continue
|
||||||
|
|
||||||
|
# Store record in destination collection
|
||||||
|
curated_write_record(
|
||||||
|
service_url=destination_url,
|
||||||
|
collection=destination_collection,
|
||||||
|
class_name=class_name,
|
||||||
|
record=record,
|
||||||
|
token=destination_token)
|
||||||
|
|
||||||
# Delete record from incoming area
|
# Delete record from incoming area
|
||||||
url = f'{arguments.base_url}/{arguments.collection}/incoming/{label}/record?pid={quote_plus(record["pid"])}'
|
incoming_delete_record(
|
||||||
delete_url(
|
service_url=arguments.service_url,
|
||||||
url=url,
|
collection=arguments.collection,
|
||||||
|
label=label,
|
||||||
|
pid=record['pid'],
|
||||||
token=curator_token,
|
token=curator_token,
|
||||||
)
|
)
|
||||||
|
|
||||||
|
if output is not None:
|
||||||
|
print(json.dumps(output, ensure_ascii=False))
|
||||||
|
|
||||||
return 0
|
return 0
|
||||||
|
|
||||||
|
|
||||||
|
def main():
|
||||||
|
try:
|
||||||
|
return _main()
|
||||||
|
except HTTPError as e:
|
||||||
|
print(f'ERROR: {e}: {e.response.text}', file=sys.stderr, flush=True)
|
||||||
|
return 1
|
||||||
|
|
||||||
|
|
||||||
if __name__ == '__main__':
|
if __name__ == '__main__':
|
||||||
sys.exit(main())
|
sys.exit(main())
|
||||||
|
|
|
||||||
|
|
@ -9,10 +9,13 @@ import sys
|
||||||
from dump_things_service.converter import Format, FormatConverter
|
from dump_things_service.converter import Format, FormatConverter
|
||||||
from rdflib import Graph
|
from rdflib import Graph
|
||||||
|
|
||||||
from triple_tools.communicate import get_all
|
from dump_things_pyclient.communicate import (
|
||||||
|
HTTPError,
|
||||||
|
get_paginated,
|
||||||
|
)
|
||||||
|
|
||||||
|
|
||||||
def main():
|
def _main():
|
||||||
argument_parser = argparse.ArgumentParser()
|
argument_parser = argparse.ArgumentParser()
|
||||||
argument_parser.add_argument('schema')
|
argument_parser.add_argument('schema')
|
||||||
argument_parser.add_argument('base_url')
|
argument_parser.add_argument('base_url')
|
||||||
|
|
@ -22,8 +25,7 @@ def main():
|
||||||
|
|
||||||
token = os.environ.get('DUMPTHINGS_TOKEN')
|
token = os.environ.get('DUMPTHINGS_TOKEN')
|
||||||
if token is None:
|
if token is None:
|
||||||
print('WARNING: DUMPTHINGS_TOKEN not set', file=sys.stderr, flush=True)
|
print('WARNING: environment variable DUMPTHINGS_TOKEN not set', file=sys.stderr, flush=True)
|
||||||
|
|
||||||
|
|
||||||
print(f'Creating converter for schema {arguments.schema} ...', file=sys.stderr, end='', flush=True)
|
print(f'Creating converter for schema {arguments.schema} ...', file=sys.stderr, end='', flush=True)
|
||||||
converter = FormatConverter(
|
converter = FormatConverter(
|
||||||
|
|
@ -41,7 +43,7 @@ def main():
|
||||||
)
|
)
|
||||||
|
|
||||||
g = Graph()
|
g = Graph()
|
||||||
for json_object in get_all(url_base, os.environ.get('DUMPTHINGS_TOKEN'), {'size': '100'}, show_progress=True):
|
for json_object in get_paginated(url_base, page_size=100, token=os.environ.get('DUMPTHINGS_TOKEN')):
|
||||||
object_class = json_object.get('schema_type')
|
object_class = json_object.get('schema_type')
|
||||||
if object_class is None:
|
if object_class is None:
|
||||||
raise ValueError(f'No schema_type in {json_object}')
|
raise ValueError(f'No schema_type in {json_object}')
|
||||||
|
|
@ -51,7 +53,7 @@ def main():
|
||||||
try:
|
try:
|
||||||
ttl = converter.convert(json_object, class_name)
|
ttl = converter.convert(json_object, class_name)
|
||||||
except ValueError as ve:
|
except ValueError as ve:
|
||||||
print(f'\nWARNING: could not convert record {json_object["pid"]}: {ve}', file=sys.stderr, flush=True)
|
print(f'WARNING: could not convert record {json_object["pid"]}: {ve}', file=sys.stderr, flush=True)
|
||||||
continue
|
continue
|
||||||
g.parse(io.StringIO(ttl), format='n3')
|
g.parse(io.StringIO(ttl), format='n3')
|
||||||
|
|
||||||
|
|
@ -59,5 +61,13 @@ def main():
|
||||||
return 0
|
return 0
|
||||||
|
|
||||||
|
|
||||||
|
def main():
|
||||||
|
try:
|
||||||
|
return _main()
|
||||||
|
except HTTPError as e:
|
||||||
|
print(f'ERROR: {e}: {e.response.text}', file=sys.stderr, flush=True)
|
||||||
|
return 1
|
||||||
|
|
||||||
|
|
||||||
if __name__ == '__main__':
|
if __name__ == '__main__':
|
||||||
sys.exit(main())
|
sys.exit(main())
|
||||||
|
|
|
||||||
|
|
@ -4,28 +4,29 @@ import argparse
|
||||||
import os
|
import os
|
||||||
import sys
|
import sys
|
||||||
|
|
||||||
from triple_tools.communicate import (
|
from dump_things_pyclient.communicate import (
|
||||||
delete_url,
|
HTTPError,
|
||||||
get_records_from_label,
|
incoming_delete_record,
|
||||||
|
incoming_read_records,
|
||||||
)
|
)
|
||||||
|
|
||||||
|
|
||||||
def main():
|
def _main():
|
||||||
argument_parser = argparse.ArgumentParser()
|
argument_parser = argparse.ArgumentParser()
|
||||||
argument_parser.add_argument('base_url')
|
argument_parser.add_argument('base_url')
|
||||||
argument_parser.add_argument('collection')
|
argument_parser.add_argument('collection')
|
||||||
argument_parser.add_argument('label')
|
argument_parser.add_argument('label')
|
||||||
argument_parser.add_argument('--list-only', '-l', action='store_true')
|
argument_parser.add_argument('--list-only', '-l', action='store_true', help="list records in the inbox, don't remove them")
|
||||||
|
|
||||||
arguments = argument_parser.parse_args()
|
arguments = argument_parser.parse_args()
|
||||||
|
|
||||||
curator_token = os.environ.get('CURATOR_TOKEN')
|
curator_token = os.environ.get('CURATOR_TOKEN')
|
||||||
if curator_token is None:
|
if curator_token is None:
|
||||||
print('ERROR: CURATOR_TOKEN not set', file=sys.stderr, flush=True)
|
print('ERROR: environment variable CURATOR_TOKEN not set', file=sys.stderr, flush=True)
|
||||||
return 1
|
return 1
|
||||||
|
|
||||||
for record in get_records_from_label(
|
for record, _, _, _, _ in incoming_read_records(
|
||||||
url_base=arguments.base_url,
|
service_url=arguments.base_url,
|
||||||
collection=arguments.collection,
|
collection=arguments.collection,
|
||||||
label=arguments.label,
|
label=arguments.label,
|
||||||
token=curator_token,
|
token=curator_token,
|
||||||
|
|
@ -35,13 +36,24 @@ def main():
|
||||||
continue
|
continue
|
||||||
|
|
||||||
# Delete record from incoming area
|
# Delete record from incoming area
|
||||||
label_url = f'{arguments.base_url}/{arguments.collection}/incoming/{arguments.label}'
|
incoming_delete_record(
|
||||||
delete_url(
|
service_url=arguments.base_url,
|
||||||
url = f'{label_url}/record?pid={record["pid"]}',
|
collection=arguments.collection,
|
||||||
|
label=arguments.label,
|
||||||
|
pid=record['pid'],
|
||||||
token=curator_token,
|
token=curator_token,
|
||||||
|
|
||||||
)
|
)
|
||||||
return 0
|
return 0
|
||||||
|
|
||||||
|
|
||||||
|
def main():
|
||||||
|
try:
|
||||||
|
return _main()
|
||||||
|
except HTTPError as e:
|
||||||
|
print(f'ERROR: {e}: {e.response.text}', file=sys.stderr, flush=True)
|
||||||
|
return 1
|
||||||
|
|
||||||
|
|
||||||
if __name__ == '__main__':
|
if __name__ == '__main__':
|
||||||
sys.exit(main())
|
sys.exit(main())
|
||||||
|
|
|
||||||
|
|
@ -1,130 +0,0 @@
|
||||||
from __future__ import annotations
|
|
||||||
|
|
||||||
from collections.abc import Iterable
|
|
||||||
from urllib.parse import quote_plus
|
|
||||||
|
|
||||||
import requests
|
|
||||||
from progress.bar import Bar
|
|
||||||
|
|
||||||
|
|
||||||
def _create_url(
|
|
||||||
url_base: str,
|
|
||||||
parameters: dict[str, str] | None = None,
|
|
||||||
page_number: int | None = None,
|
|
||||||
):
|
|
||||||
parameters = parameters or {}
|
|
||||||
parameters.update({'page': str(page_number)})
|
|
||||||
all_parameters = [f'{k}={quote_plus(v)}' for k, v in parameters.items()]
|
|
||||||
return url_base + '?' + '&'.join(all_parameters)
|
|
||||||
|
|
||||||
|
|
||||||
def _get_page(
|
|
||||||
url_base: str,
|
|
||||||
token: str | None = None,
|
|
||||||
parameters: Iterable[str] | None = None,
|
|
||||||
page_number: int | None = None,
|
|
||||||
):
|
|
||||||
return get_from_url(_create_url(url_base, parameters, page_number), token)
|
|
||||||
|
|
||||||
|
|
||||||
def get_all(
|
|
||||||
url_base: str,
|
|
||||||
token: str | None = None,
|
|
||||||
parameters: dict[str, str] | None = None,
|
|
||||||
show_progress: bool = False,
|
|
||||||
):
|
|
||||||
# Get the first result and the number of pages
|
|
||||||
result = _get_page(url_base, token, parameters, page_number=1)
|
|
||||||
total_pages = result['pages']
|
|
||||||
if total_pages == 0:
|
|
||||||
return
|
|
||||||
|
|
||||||
if show_progress:
|
|
||||||
bar = Bar('Pages', max=total_pages, suffix='%(index)d/%(max)d - %(eta_td)s')
|
|
||||||
yield from result['items']
|
|
||||||
bar.next()
|
|
||||||
else:
|
|
||||||
yield from result['items']
|
|
||||||
|
|
||||||
# Get remaining results
|
|
||||||
for page in range(2, total_pages + 1):
|
|
||||||
result = _get_page(url_base, token, parameters, page_number=page)
|
|
||||||
yield from result['items']
|
|
||||||
if show_progress:
|
|
||||||
bar.next()
|
|
||||||
|
|
||||||
if show_progress:
|
|
||||||
bar.finish()
|
|
||||||
|
|
||||||
|
|
||||||
def check_result(
|
|
||||||
result: requests.Response,
|
|
||||||
method: str,
|
|
||||||
url: str
|
|
||||||
):
|
|
||||||
if not 200 <= result.status_code < 300:
|
|
||||||
msg = f'HTTP {method} {url} failed: {result.status_code}: {result.text}'
|
|
||||||
raise RuntimeError(msg)
|
|
||||||
|
|
||||||
|
|
||||||
def get_from_url(
|
|
||||||
url: str,
|
|
||||||
token: str,
|
|
||||||
):
|
|
||||||
r = requests.get(
|
|
||||||
url,
|
|
||||||
headers=({
|
|
||||||
'x-dumpthings-token': token,
|
|
||||||
} if token else {}),
|
|
||||||
)
|
|
||||||
check_result(r, 'GET', url)
|
|
||||||
return r.json()
|
|
||||||
|
|
||||||
|
|
||||||
def post_to_url(
|
|
||||||
url: str,
|
|
||||||
token: str | None,
|
|
||||||
content: list | dict
|
|
||||||
):
|
|
||||||
r = requests.post(
|
|
||||||
url,
|
|
||||||
headers=({
|
|
||||||
'x-dumpthings-token': token,
|
|
||||||
} if token else {}),
|
|
||||||
json=content,
|
|
||||||
)
|
|
||||||
check_result(r, 'POST', url)
|
|
||||||
return r.json()
|
|
||||||
|
|
||||||
|
|
||||||
def delete_url(
|
|
||||||
url: str,
|
|
||||||
token: str | None,
|
|
||||||
):
|
|
||||||
r = requests.delete(
|
|
||||||
url,
|
|
||||||
headers=({
|
|
||||||
'x-dumpthings-token': token,
|
|
||||||
} if token else {}),
|
|
||||||
)
|
|
||||||
check_result(r, 'DELETE', url)
|
|
||||||
return r.json()
|
|
||||||
|
|
||||||
|
|
||||||
def get_labels(
|
|
||||||
url_base: str,
|
|
||||||
collection: str,
|
|
||||||
token: str | None = None,
|
|
||||||
):
|
|
||||||
yield from get_from_url(f'{url_base}/{collection}/incoming/', token)
|
|
||||||
|
|
||||||
|
|
||||||
def get_records_from_label(
|
|
||||||
url_base: str,
|
|
||||||
collection,
|
|
||||||
label: str,
|
|
||||||
token: str | None = None,
|
|
||||||
parameters: dict[str, str] | None = None,
|
|
||||||
):
|
|
||||||
label_url = f'{url_base}/{collection}/incoming/{label}/records/p/'
|
|
||||||
yield from get_all(label_url, token=token, parameters=parameters)
|
|
||||||
|
|
@ -11,9 +11,21 @@ from dump_things_service.converter import (
|
||||||
)
|
)
|
||||||
|
|
||||||
|
|
||||||
|
description = f"""Read JSON records from stdin and convert them to TTL
|
||||||
|
|
||||||
|
This command reads one record per line, either JSON format or a JSON-string
|
||||||
|
with a TTL-document from stdin, converts them to TTL or JSON and prints them
|
||||||
|
to stdout.
|
||||||
|
|
||||||
|
"""
|
||||||
|
|
||||||
|
|
||||||
def main():
|
def main():
|
||||||
argument_parser = argparse.ArgumentParser()
|
argument_parser = argparse.ArgumentParser(
|
||||||
argument_parser.add_argument('schema')
|
description=description,
|
||||||
|
formatter_class=argparse.RawDescriptionHelpFormatter,
|
||||||
|
)
|
||||||
|
argument_parser.add_argument('schema', help='URL of the schema that should be used')
|
||||||
|
|
||||||
arguments = argument_parser.parse_args()
|
arguments = argument_parser.parse_args()
|
||||||
|
|
||||||
|
|
@ -26,16 +38,16 @@ def main():
|
||||||
print(' done', file=sys.stderr, flush=True)
|
print(' done', file=sys.stderr, flush=True)
|
||||||
|
|
||||||
error = False
|
error = False
|
||||||
|
|
||||||
for line in sys.stdin:
|
for line in sys.stdin:
|
||||||
json_object = json.loads(line)
|
json_object = json.loads(line)
|
||||||
|
|
||||||
object_class = json_object.get('schema_type')
|
object_class = json_object.get('schema_type')
|
||||||
if object_class is None:
|
if object_class is None:
|
||||||
|
error = True
|
||||||
print(f'ERROR: No schema_type in {json_object}', file=sys.stderr, flush=True)
|
print(f'ERROR: No schema_type in {json_object}', file=sys.stderr, flush=True)
|
||||||
continue
|
continue
|
||||||
|
|
||||||
class_name = re.search('([_A-Za-z0-9]*$)', object_class).group(0)
|
class_name = re.search('([_A-Za-z0-9]*$)', object_class).group(0)
|
||||||
|
|
||||||
try:
|
try:
|
||||||
ttl = converter.convert(json_object, class_name)
|
ttl = converter.convert(json_object, class_name)
|
||||||
except ValueError as ve:
|
except ValueError as ve:
|
||||||
|
|
|
||||||
|
|
@ -1,45 +1,60 @@
|
||||||
from __future__ import annotations
|
from __future__ import annotations
|
||||||
|
|
||||||
import argparse
|
import argparse
|
||||||
|
import json
|
||||||
import os
|
import os
|
||||||
import sys
|
import sys
|
||||||
|
from collections import defaultdict
|
||||||
|
|
||||||
from triple_tools.communicate import (
|
from dump_things_pyclient.communicate import (
|
||||||
get_labels,
|
HTTPError,
|
||||||
get_records_from_label,
|
incoming_read_labels,
|
||||||
|
incoming_read_records,
|
||||||
)
|
)
|
||||||
|
|
||||||
|
|
||||||
def main():
|
def _main():
|
||||||
argument_parser = argparse.ArgumentParser()
|
argument_parser = argparse.ArgumentParser()
|
||||||
argument_parser.add_argument('base_url')
|
argument_parser.add_argument('base_url')
|
||||||
argument_parser.add_argument('collection')
|
argument_parser.add_argument('collection')
|
||||||
argument_parser.add_argument('--show-records', '-s', action='store_true')
|
argument_parser.add_argument('-s', '--show-records', action='store_true', help='show the records in the inboxes as well')
|
||||||
|
|
||||||
arguments = argument_parser.parse_args()
|
arguments = argument_parser.parse_args()
|
||||||
|
|
||||||
curator_token = os.environ.get('CURATOR_TOKEN')
|
curator_token = os.environ.get('CURATOR_TOKEN')
|
||||||
if curator_token is None:
|
if curator_token is None:
|
||||||
print('ERROR: CURATOR_TOKEN not set', file=sys.stderr, flush=True)
|
print('ERROR: environment variable CURATOR_TOKEN not set', file=sys.stderr, flush=True)
|
||||||
return 1
|
return 1
|
||||||
|
|
||||||
for label in get_labels(
|
result = {}
|
||||||
url_base=arguments.base_url,
|
for label in incoming_read_labels(
|
||||||
|
service_url=arguments.base_url,
|
||||||
collection=arguments.collection,
|
collection=arguments.collection,
|
||||||
token=curator_token,
|
token=curator_token,
|
||||||
):
|
):
|
||||||
print(label)
|
result[label] = []
|
||||||
if arguments.show_records:
|
if arguments.show_records:
|
||||||
for record in get_records_from_label(
|
for record, _, _, _, _ in incoming_read_records(
|
||||||
url_base=arguments.base_url,
|
service_url=arguments.base_url,
|
||||||
collection=arguments.collection,
|
collection=arguments.collection,
|
||||||
label=label,
|
label=label,
|
||||||
token=curator_token,
|
token=curator_token,
|
||||||
):
|
):
|
||||||
print('\t', record)
|
result[label].append(record)
|
||||||
|
|
||||||
|
if arguments.show_records is False:
|
||||||
|
result = list(result)
|
||||||
|
print(json.dumps(result, indent=2, ensure_ascii=False))
|
||||||
return 0
|
return 0
|
||||||
|
|
||||||
|
|
||||||
|
def main():
|
||||||
|
try:
|
||||||
|
return _main()
|
||||||
|
except HTTPError as e:
|
||||||
|
print(f'ERROR: {e}: {e.response.text}', file=sys.stderr, flush=True)
|
||||||
|
return 1
|
||||||
|
|
||||||
|
|
||||||
if __name__ == '__main__':
|
if __name__ == '__main__':
|
||||||
sys.exit(main())
|
sys.exit(main())
|
||||||
|
|
|
||||||
|
|
@ -5,42 +5,51 @@ import json
|
||||||
import os
|
import os
|
||||||
import sys
|
import sys
|
||||||
|
|
||||||
from triple_tools.communicate import post_to_url
|
from dump_things_pyclient.communicate import (
|
||||||
|
collection_write_record,
|
||||||
|
curated_write_record,
|
||||||
|
)
|
||||||
|
|
||||||
|
|
||||||
def main():
|
def main():
|
||||||
argument_parser = argparse.ArgumentParser()
|
argument_parser = argparse.ArgumentParser()
|
||||||
argument_parser.add_argument('base_url')
|
argument_parser.add_argument('base_url')
|
||||||
argument_parser.add_argument('collection')
|
argument_parser.add_argument('collection')
|
||||||
argument_parser.add_argument('cls')
|
argument_parser.add_argument('cls', metavar='class')
|
||||||
argument_parser.add_argument('--curated', action='store_true')
|
argument_parser.add_argument('--curated', action='store_true', help='bypass inbox, requires curator token')
|
||||||
|
|
||||||
arguments = argument_parser.parse_args()
|
arguments = argument_parser.parse_args()
|
||||||
|
|
||||||
token = os.environ.get('DUMPTHINGS_TOKEN')
|
token = os.environ.get('DUMPTHINGS_TOKEN')
|
||||||
if token is None:
|
if token is None:
|
||||||
print('WARNING: DUMPTHINGS_TOKEN not set', file=sys.stderr, flush=True)
|
print(
|
||||||
|
'WARNING: environment variable DUMPTHINGS_TOKEN not set',
|
||||||
|
file=sys.stderr,
|
||||||
|
flush=True,
|
||||||
|
)
|
||||||
|
|
||||||
url = (
|
|
||||||
arguments.base_url
|
|
||||||
+ ('' if arguments.base_url.endswith('/') else '/')
|
|
||||||
+ arguments.collection
|
|
||||||
+ '/'
|
|
||||||
)
|
|
||||||
if arguments.curated:
|
if arguments.curated:
|
||||||
url += f'curated/'
|
write_record = curated_write_record
|
||||||
url += f'record/{arguments.cls}'
|
else:
|
||||||
|
write_record = collection_write_record
|
||||||
|
|
||||||
posted = False
|
posted = False
|
||||||
for line in sys.stdin:
|
for line in sys.stdin:
|
||||||
rec = json.loads(line)
|
record = json.loads(line)
|
||||||
try:
|
try:
|
||||||
post_to_url(url, token, rec)
|
write_record(
|
||||||
|
service_url=arguments.base_url,
|
||||||
|
collection=arguments.collection,
|
||||||
|
class_name=arguments.cls,
|
||||||
|
record=record,
|
||||||
|
token=token,
|
||||||
|
)
|
||||||
except Exception as e:
|
except Exception as e:
|
||||||
print(e)
|
print(f'Error: {e}', file=sys.stderr, flush=True)
|
||||||
else:
|
else:
|
||||||
posted = True
|
posted = True
|
||||||
print('.', end='', flush=True)
|
print('.', end='', flush=True)
|
||||||
|
|
||||||
if posted:
|
if posted:
|
||||||
# final newline
|
# final newline
|
||||||
print('')
|
print('')
|
||||||
|
|
|
||||||
|
|
@ -4,41 +4,172 @@ import argparse
|
||||||
import json
|
import json
|
||||||
import os
|
import os
|
||||||
import sys
|
import sys
|
||||||
|
from functools import partial
|
||||||
|
|
||||||
from triple_tools.communicate import get_all
|
from dump_things_pyclient.communicate import (
|
||||||
|
HTTPError,
|
||||||
|
collection_read_records,
|
||||||
|
collection_read_records_of_class,
|
||||||
|
collection_read_record_with_pid,
|
||||||
|
curated_read_records,
|
||||||
|
curated_read_records_of_class,
|
||||||
|
curated_read_record_with_pid,
|
||||||
|
incoming_read_labels,
|
||||||
|
incoming_read_records,
|
||||||
|
incoming_read_records_of_class,
|
||||||
|
incoming_read_record_with_pid,
|
||||||
|
)
|
||||||
|
|
||||||
|
|
||||||
|
token_name = 'DUMPTHINGS_TOKEN'
|
||||||
|
|
||||||
|
description = f"""Get records from a collection on a dump-things-service
|
||||||
|
|
||||||
|
This command lists records that are stored in a dump-things-service. By
|
||||||
|
default all records that are readable with the given token, or the default
|
||||||
|
token, will be displayed. The output format is JSONL (JSON lines), where
|
||||||
|
every line contains a record or a record with paging information. If `ttl`
|
||||||
|
is chosen as format of the output records, the record content will be a string
|
||||||
|
that contains a TTL-documents.
|
||||||
|
|
||||||
|
The command supports to read from the curated area only, to read from incoming
|
||||||
|
areas, or to read records with a given PID.
|
||||||
|
|
||||||
|
Pagination information is returned for paginated results, when requested with
|
||||||
|
`-P/--pagination`. All results are paginated except "get a record with a given PID"
|
||||||
|
and "get the list of incoming zone labels".
|
||||||
|
|
||||||
|
If the environment variable "{token_name}" is set, its content will be used
|
||||||
|
as token to authenticate against the dump-things-service.
|
||||||
|
|
||||||
|
"""
|
||||||
|
|
||||||
|
|
||||||
|
def _main():
|
||||||
|
argument_parser = argparse.ArgumentParser(
|
||||||
|
description=description,
|
||||||
|
formatter_class=argparse.RawDescriptionHelpFormatter,
|
||||||
|
)
|
||||||
|
argument_parser.add_argument('service_url')
|
||||||
|
argument_parser.add_argument('collection')
|
||||||
|
argument_parser.add_argument('-c', '--class', dest='class_name', help='only read records of this class, ignored if "--pid" is provided')
|
||||||
|
argument_parser.add_argument('-f', '--format', help='format of the output records ("json" or "ttl")')
|
||||||
|
argument_parser.add_argument('-p', '--pid', help='the pid of the record that should be read')
|
||||||
|
argument_parser.add_argument('-i', '--incoming', metavar='LABEL', help='read from incoming area with the given label in the collection, if LABEL is "-", return the labels')
|
||||||
|
argument_parser.add_argument('-C', '--curated', action='store_true', help='read from the curated area of the collection')
|
||||||
|
argument_parser.add_argument('-m', '--matching', help='return only records that have a matching value (use % as wildcard). Ignored if "--pid" is provided. (NOTE: not all endpoints and backends support matching.)')
|
||||||
|
argument_parser.add_argument('-s', '--page-size', type=int, help='set the page size (1 - 100) (default: 100), ignored if "--pid" is provided')
|
||||||
|
argument_parser.add_argument('-F', '--first-page', type=int, help='the first page to return (default: 1), ignored if "--pid" is provided')
|
||||||
|
argument_parser.add_argument('-l', '--last-page', type=int, default=None, help='the last page to return (default: None (return all pages), ignored if "--pid" is provided')
|
||||||
|
argument_parser.add_argument('--stats', action='store_true', help='show the number of records and pages and exit, ignored if "--pid" is provided')
|
||||||
|
argument_parser.add_argument('-P', '--pagination', action='store_true', help='show pagination information (each record from an paginated endpoint is returned as [<record>, <current page number>, <total number of pages>, <page size>, <total number of items>]')
|
||||||
|
|
||||||
|
arguments = argument_parser.parse_args()
|
||||||
|
if arguments.parameter:
|
||||||
|
print(
|
||||||
|
f'WARNING: option -p/--parameter is ignored, use existing options instead',
|
||||||
|
file=sys.stderr,
|
||||||
|
flush=True)
|
||||||
|
|
||||||
|
token = os.environ.get(token_name)
|
||||||
|
if token is None:
|
||||||
|
print(f'WARNING: {token_name} not set', file=sys.stderr, flush=True)
|
||||||
|
|
||||||
|
if arguments.incoming and arguments.curated:
|
||||||
|
print(
|
||||||
|
'ERROR: -i/--incoming and -c/--curated are mutually exclusive',
|
||||||
|
file=sys.stderr,
|
||||||
|
flush=True)
|
||||||
|
return 1
|
||||||
|
|
||||||
|
kwargs = dict(
|
||||||
|
service_url=arguments.service_url,
|
||||||
|
collection=arguments.collection,
|
||||||
|
token=token,
|
||||||
|
)
|
||||||
|
|
||||||
|
if arguments.incoming == '-':
|
||||||
|
result = incoming_read_labels(**kwargs)
|
||||||
|
print('\n'.join(
|
||||||
|
map(
|
||||||
|
partial(json.dumps, ensure_ascii=False),
|
||||||
|
result)))
|
||||||
|
return 0
|
||||||
|
|
||||||
|
elif arguments.pid:
|
||||||
|
for argument_value, argument_name in (
|
||||||
|
(arguments.matching, '-m/--matching'),
|
||||||
|
(arguments.page_size, '-s/--page_size'),
|
||||||
|
(arguments.first_page, '-F/--first_page'),
|
||||||
|
(arguments.last_page, '-l/--last_page'),
|
||||||
|
(arguments.stats, '--stats'),
|
||||||
|
(arguments.class_name, '-c/--class'),
|
||||||
|
):
|
||||||
|
if argument_value:
|
||||||
|
print(
|
||||||
|
f'WARNING: {argument_name} ignored because "-p/--pid" is provided',
|
||||||
|
file=sys.stderr,
|
||||||
|
flush=True)
|
||||||
|
|
||||||
|
kwargs['pid'] = arguments.pid
|
||||||
|
if arguments.curated:
|
||||||
|
result = curated_read_record_with_pid(**kwargs)
|
||||||
|
elif arguments.incoming:
|
||||||
|
kwargs['label'] = arguments.incoming
|
||||||
|
result = incoming_read_record_with_pid(**kwargs)
|
||||||
|
else:
|
||||||
|
kwargs['format'] = arguments.format
|
||||||
|
result = collection_read_record_with_pid(**kwargs)
|
||||||
|
print(json.dumps(result, ensure_ascii=False))
|
||||||
|
return 0
|
||||||
|
|
||||||
|
elif arguments.class_name:
|
||||||
|
kwargs.update(dict(
|
||||||
|
class_name=arguments.class_name,
|
||||||
|
matching=arguments.matching,
|
||||||
|
page=arguments.first_page or 1,
|
||||||
|
size=arguments.page_size or 100,
|
||||||
|
last_page=arguments.last_page,
|
||||||
|
))
|
||||||
|
if arguments.curated:
|
||||||
|
result = curated_read_records_of_class(**kwargs)
|
||||||
|
elif arguments.incoming:
|
||||||
|
kwargs['label'] = arguments.incoming
|
||||||
|
result = incoming_read_records_of_class(**kwargs)
|
||||||
|
else:
|
||||||
|
kwargs['format'] = arguments.format
|
||||||
|
result = collection_read_records_of_class(**kwargs)
|
||||||
|
else:
|
||||||
|
kwargs.update(dict(
|
||||||
|
matching=arguments.matching,
|
||||||
|
page=arguments.first_page or 1,
|
||||||
|
size=arguments.page_size or 100,
|
||||||
|
last_page=arguments.last_page,
|
||||||
|
))
|
||||||
|
if arguments.curated:
|
||||||
|
result = curated_read_records(**kwargs)
|
||||||
|
elif arguments.incoming:
|
||||||
|
kwargs['label'] = arguments.incoming
|
||||||
|
result = incoming_read_records(**kwargs)
|
||||||
|
else:
|
||||||
|
kwargs['format'] = arguments.format
|
||||||
|
result = collection_read_records(**kwargs)
|
||||||
|
|
||||||
|
if arguments.pagination:
|
||||||
|
for record in result:
|
||||||
|
print(json.dumps(record, ensure_ascii=False))
|
||||||
|
else:
|
||||||
|
for record in result:
|
||||||
|
print(json.dumps(record[0], ensure_ascii=False))
|
||||||
|
return 0
|
||||||
|
|
||||||
|
|
||||||
def main():
|
def main():
|
||||||
argument_parser = argparse.ArgumentParser()
|
try:
|
||||||
argument_parser.add_argument('base_url')
|
return _main()
|
||||||
argument_parser.add_argument('collection')
|
except HTTPError as e:
|
||||||
argument_parser.add_argument('-s', '--size', type=int, default=100)
|
print(f'ERROR: {e}: {e.response.text}', file=sys.stderr, flush=True)
|
||||||
argument_parser.add_argument('-p', '--parameter', action='append', default=[])
|
return 1
|
||||||
argument_parser.add_argument('-c', '--class', default=None, dest='cls')
|
|
||||||
|
|
||||||
arguments = argument_parser.parse_args()
|
|
||||||
|
|
||||||
token = os.environ.get('DUMPTHINGS_TOKEN')
|
|
||||||
if token is None:
|
|
||||||
print('WARNING: DUMPTHINGS_TOKEN not set', file=sys.stderr, flush=True)
|
|
||||||
|
|
||||||
url_base = (
|
|
||||||
arguments.base_url
|
|
||||||
+ ('' if arguments.base_url.endswith('/') else '/')
|
|
||||||
+ arguments.collection
|
|
||||||
+ f'/records/p/'
|
|
||||||
)
|
|
||||||
if arguments.cls:
|
|
||||||
url_base += f'{arguments.cls}/'
|
|
||||||
|
|
||||||
parameters = {'size': str(arguments.size)}
|
|
||||||
parameters.update({
|
|
||||||
param.split('=', 1)[0]: param.split('=', 1)[1]
|
|
||||||
for param in (arguments.parameter or [])
|
|
||||||
})
|
|
||||||
|
|
||||||
for json_object in get_all(url_base, token, parameters=parameters):
|
|
||||||
print(json.dumps(json_object))
|
|
||||||
|
|
||||||
|
|
||||||
if __name__ == '__main__':
|
if __name__ == '__main__':
|
||||||
|
|
|
||||||
87
triple_tools/read_paginated_url.py
Normal file
87
triple_tools/read_paginated_url.py
Normal file
|
|
@ -0,0 +1,87 @@
|
||||||
|
from __future__ import annotations
|
||||||
|
|
||||||
|
import argparse
|
||||||
|
import json
|
||||||
|
import os
|
||||||
|
import sys
|
||||||
|
|
||||||
|
from dump_things_pyclient.communicate import (
|
||||||
|
HTTPError,
|
||||||
|
get_paginated,
|
||||||
|
)
|
||||||
|
|
||||||
|
|
||||||
|
token_name = 'DUMPTHINGS_TOKEN'
|
||||||
|
|
||||||
|
description = f"""Read paginated endpoint
|
||||||
|
|
||||||
|
This command lists all records that are available via paginated endpoints from
|
||||||
|
a dump-things-service, e.g., from:
|
||||||
|
|
||||||
|
https://<service-location>/<collection>/records/p/
|
||||||
|
|
||||||
|
If the environment variable "{token_name}" is set, its content will be used
|
||||||
|
as token to authenticate against the dump-things-service.
|
||||||
|
|
||||||
|
"""
|
||||||
|
|
||||||
|
|
||||||
|
def _main():
|
||||||
|
argument_parser = argparse.ArgumentParser(
|
||||||
|
description=description,
|
||||||
|
formatter_class=argparse.RawDescriptionHelpFormatter,
|
||||||
|
)
|
||||||
|
argument_parser.add_argument('url', help='url of the paginated endpoint of the dump-things-service')
|
||||||
|
argument_parser.add_argument('-s', '--page-size', type=int, default=100, help='set the page size (1 - 100) (default: 100)')
|
||||||
|
argument_parser.add_argument('-F', '--first-page', type=int, default=1, help='the first page to return (default: 1)')
|
||||||
|
argument_parser.add_argument('-l', '--last-page', type=int, default=None, help='the last page to return (default: None (return all pages)')
|
||||||
|
argument_parser.add_argument('--stats', action='store_true', help='show information about the number of records and pages and exit, the format is is returned as [<total number of pages>, <page size>, <total number of items>]')
|
||||||
|
argument_parser.add_argument('-f', '--format', help='format of the output records ("json" or "ttl"). (NOTE: not all endpoints support the format parameter.)')
|
||||||
|
argument_parser.add_argument('-m', '--matching', help='return only records that have a matching value (use %% as wildcard). (NOTE: not all endpoints and backends support matching.)')
|
||||||
|
argument_parser.add_argument('-p', '--pagination', action='store_true', help='show pagination information (each record from an paginated endpoint is returned as [<record>, <current page number>, <total number of pages>, <page size>, <total number of items>]')
|
||||||
|
|
||||||
|
arguments = argument_parser.parse_args()
|
||||||
|
|
||||||
|
token = os.environ.get(token_name)
|
||||||
|
if token is None:
|
||||||
|
print(f'WARNING: {token_name} not set', file=sys.stderr, flush=True)
|
||||||
|
|
||||||
|
result = get_paginated(
|
||||||
|
url=arguments.url,
|
||||||
|
token=token,
|
||||||
|
first_page=arguments.first_page,
|
||||||
|
page_size=arguments.page_size,
|
||||||
|
last_page=arguments.last_page,
|
||||||
|
parameters={
|
||||||
|
'format': arguments.format,
|
||||||
|
**({'matching': arguments.matching}
|
||||||
|
if arguments.matching is not None
|
||||||
|
else {}
|
||||||
|
),
|
||||||
|
}
|
||||||
|
)
|
||||||
|
|
||||||
|
if arguments.stats:
|
||||||
|
record = next(result)
|
||||||
|
print(json.dumps(record[2:], ensure_ascii=False))
|
||||||
|
return 0
|
||||||
|
|
||||||
|
if arguments.pagination:
|
||||||
|
for record in result:
|
||||||
|
print(json.dumps(record, ensure_ascii=False))
|
||||||
|
else:
|
||||||
|
for record in result:
|
||||||
|
print(json.dumps(record[0], ensure_ascii=False))
|
||||||
|
return 0
|
||||||
|
|
||||||
|
|
||||||
|
def main():
|
||||||
|
try:
|
||||||
|
return _main()
|
||||||
|
except HTTPError as e:
|
||||||
|
print(f'ERROR: {e}: {e.response.text}', file=sys.stderr, flush=True)
|
||||||
|
return 1
|
||||||
|
|
||||||
|
|
||||||
|
if __name__ == '__main__':
|
||||||
|
sys.exit(main())
|
||||||
Loading…
Add table
Add a link
Reference in a new issue