Report on using export command #14

Closed
opened 2026-01-30 14:05:06 +00:00 by mih · 2 comments
Owner

I used the command as part of a data model transitiion. This issue is only reporting impressions.

Filename scheme makes it hard to navigate. The transition requires to reconsolidate uncurated records, and to compare across collections. This is made hard, because records with one and the same PID appear to map into different filenames in each collection/inbox type. Using the same filename would help a lot here. It need not be the PID itself (I know it has complications), but a look-up to reuse a numbered filename when a PID occurs again would be sufficient.

schema_type is duplicated in directory name and record. My transition requires a namespace change for the schema itself. Even simple records (pid and display label only) contain a schema_type item. This means all records need processing to remove it. The original type is captured in the directory name and the new type needs to be declare when posting the records to the new DB.

I used the command as part of a data model transitiion. This issue is only reporting impressions. **Filename scheme makes it hard to navigate.** The transition requires to reconsolidate uncurated records, and to compare across collections. This is made hard, because records with one and the same PID appear to map into different filenames in each collection/inbox type. Using the same filename would help a lot here. It need not be the PID itself (I know it has complications), but a look-up to reuse a numbered filename when a PID occurs again would be sufficient. **`schema_type` is duplicated in directory name and record**. My transition requires a namespace change for the schema itself. Even simple records (pid and display label only) contain a `schema_type` item. This means all records need processing to remove it. The original type is captured in the directory name and the new type needs to be declare when posting the records to the new DB.
Owner

The issues are addressed in PR #28 and published in version 0.2.7

  • schema_type-attributes are now by default removed before storing a record
  • file names are now md5 hashes of the PIDs (with a 3-digit prefix directory structure)

In addition there is now

  • percentage based progress reporting
  • support for JSON or YAML output
The issues are addressed in PR #28 and published in version 0.2.7 - `schema_type`-attributes are now by default removed before storing a record - file names are now md5 hashes of the PIDs (with a 3-digit prefix directory structure) In addition there is now - percentage based progress reporting - support for JSON or YAML output
Owner

Closing this for now. Please reopen if additional issues come up

Closing this for now. Please reopen if additional issues come up
cmo closed this issue 2026-02-05 13:57:06 +00:00
Sign in to join this conversation.
No labels
No milestone
No project
No assignees
2 participants
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
orinoco/dump-things-pyclient#14
No description provided.