Note on curation load aware schema design #73

Merged
mih merged 1 commit from flatentry into main 2025-05-23 14:21:26 +00:00

View file

@ -11,11 +11,11 @@ uses their own data models. Each system allows for submission of additional
or edited records to a staging area where submissions can be subjected to
verification and curation, before they are accepted.
Metadata records from each system can be transformed to be compliant with a
generic use case agnostic data model. This generic data model facilitates the
integration of information across applications and workflows. Transformed
metadata records are, again, submitted for curation and integration into
a central knowledge base.
Metadata records from each system can be losslessly transformed to be compliant
with a generic use case agnostic data model. This generic data model
facilitates the integration of information across applications and workflows.
Transformed metadata records are, again, submitted for curation and integration
into a central knowledge base.
This central knowledge base can be queried to produce integrated reports.
Knowledge base records can also be exported to the data models of individual
@ -66,11 +66,26 @@ consuming metadata, curation workflow differ substantially. The following sectio
collect some ideas and constraints to keep in mind when designing such workflows
in this context.
### Design schemas to reduce churn
Data models should be designed to prefer linkage to broader, more slowly evolving,
less context constrained entities. For example, the relationship between a
container-type entity and its parts should be implemented by a `part_of`
relationship, rather than a list of `parts` in the container. This enables
the addition of a new part via the creation of a single, additional record
-- as opposed to having to create the new record, and then also having to update
the part-list.
This design choice does not limit the on-demand construction of part-lists
for "runtime" representations of knowledge for query-focused applications.
But it reduces to load on data curation workflows, by reducing the number of
events that require knowledge merge operations, in favor of plain additions.
### PIDs also require curation
Persistent identifiers (PID) play a key role in this metadata concept. Data
models and vocabularies can change flexibly, but records still describe one and
the same `Thing` when the PID identical.
the same `Thing` when the PID is identical.
Persistent identifiers allow referencing entities in contexts where not all
information about an entity is available. One can reference a `Person` without