Note on curation load aware schema design #73
1 changed files with 21 additions and 6 deletions
|
|
@ -11,11 +11,11 @@ uses their own data models. Each system allows for submission of additional
|
|||
or edited records to a staging area where submissions can be subjected to
|
||||
verification and curation, before they are accepted.
|
||||
|
||||
Metadata records from each system can be transformed to be compliant with a
|
||||
generic use case agnostic data model. This generic data model facilitates the
|
||||
integration of information across applications and workflows. Transformed
|
||||
metadata records are, again, submitted for curation and integration into
|
||||
a central knowledge base.
|
||||
Metadata records from each system can be losslessly transformed to be compliant
|
||||
with a generic use case agnostic data model. This generic data model
|
||||
facilitates the integration of information across applications and workflows.
|
||||
Transformed metadata records are, again, submitted for curation and integration
|
||||
into a central knowledge base.
|
||||
|
||||
This central knowledge base can be queried to produce integrated reports.
|
||||
Knowledge base records can also be exported to the data models of individual
|
||||
|
|
@ -66,11 +66,26 @@ consuming metadata, curation workflow differ substantially. The following sectio
|
|||
collect some ideas and constraints to keep in mind when designing such workflows
|
||||
in this context.
|
||||
|
||||
### Design schemas to reduce churn
|
||||
|
||||
Data models should be designed to prefer linkage to broader, more slowly evolving,
|
||||
less context constrained entities. For example, the relationship between a
|
||||
container-type entity and its parts should be implemented by a `part_of`
|
||||
relationship, rather than a list of `parts` in the container. This enables
|
||||
the addition of a new part via the creation of a single, additional record
|
||||
-- as opposed to having to create the new record, and then also having to update
|
||||
the part-list.
|
||||
|
||||
This design choice does not limit the on-demand construction of part-lists
|
||||
for "runtime" representations of knowledge for query-focused applications.
|
||||
But it reduces to load on data curation workflows, by reducing the number of
|
||||
events that require knowledge merge operations, in favor of plain additions.
|
||||
|
||||
### PIDs also require curation
|
||||
|
||||
Persistent identifiers (PID) play a key role in this metadata concept. Data
|
||||
models and vocabularies can change flexibly, but records still describe one and
|
||||
the same `Thing` when the PID identical.
|
||||
the same `Thing` when the PID is identical.
|
||||
|
||||
Persistent identifiers allow referencing entities in contexts where not all
|
||||
information about an entity is available. One can reference a `Person` without
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue