Some thoughts on curation workflows #53
1 changed files with 70 additions and 2 deletions
|
|
@ -59,7 +59,75 @@ flowchart LR
|
||||||
USER1 ~~~ USER2 ~~~ USER3
|
USER1 ~~~ USER2 ~~~ USER3
|
||||||
```
|
```
|
||||||
|
|
||||||
|
## Curation workflows
|
||||||
|
|
||||||
|
Depending on the nature of the metadata and the respective audiences for producing
|
||||||
|
consuming metadata, curation workflow differ substantially. The following sections
|
||||||
|
collect some ideas and constraints to keep in mind when designing such workflows
|
||||||
|
in this context.
|
||||||
|
|
||||||
|
### PIDs also require curation
|
||||||
|
|
||||||
|
Persistent identifiers (PID) play a key role in this metadata concept. Data
|
||||||
|
models and vocabularies can change flexibly, but records still describe one and
|
||||||
|
the same `Thing` when the PID identical.
|
||||||
|
|
||||||
|
Persistent identifiers allow referencing entities in contexts where not all
|
||||||
|
information about an entity is available. One can reference a `Person` without
|
||||||
|
having to reveal possibly sensitive information about that `Person` at the same
|
||||||
|
time. For example, a public `Person` record about an academic may only contain
|
||||||
|
a name and a work contact email (equivalent to the information available on
|
||||||
|
a corresponding author in a journal publication). At the same time, an internal
|
||||||
|
`Person` record would have additional information, like a private cell phone number.
|
||||||
|
The public record can be generated from the richer, internal record by stripping
|
||||||
|
information.
|
||||||
|
|
||||||
|
#### PIDs may require mapping
|
||||||
|
|
||||||
|
However, an identifier itself can also carry information. For example, an ORCID
|
||||||
|
identifier typically can be used to reveal the name of a person. Hence when an
|
||||||
|
ORCID is used as the PID for a metadata record, any place where the identifier
|
||||||
|
is mentioned, also reveals the name of the person. If the identifier used for
|
||||||
|
an internal, protected record and a corresponding public record are the same,
|
||||||
|
cross-referencing may be enabled unintentionally.
|
||||||
|
|
||||||
|
In such cases, it can be necessary to maintain mapping tables for PIDs of the
|
||||||
|
same entity in different contexts.
|
||||||
|
|
||||||
|
Maintaining a separate PID mapping is also an instrument to aid (future)
|
||||||
|
anonymization of records. When the mapping is destroyed (and other conditions
|
||||||
|
are fulfilled too), a PID-based re-identification is potentially made impossible.
|
||||||
|
|
||||||
|
#### PIDs may require curation
|
||||||
|
|
||||||
|
When metadata records are submitted by non-experts these records already need to have
|
||||||
|
PIDs in order to enable submission of multiple, interlinked records. It is advisable
|
||||||
|
to use dedicated (actually only temporarily persistent) PIDs for this purpose.
|
||||||
|
|
||||||
|
The reason is that a submitter cannot necessarily be trusted to use the PID of an
|
||||||
|
existing record to make further statements. Instead, they may create a new record,
|
||||||
|
with the same information as an existing one, and consequently use a new PID to link
|
||||||
|
information to this entity. While a curation could keep both records, and declare them
|
||||||
|
"same as" of each other, this needlessly inflates the number of records, increases
|
||||||
|
the maintenance load, and complicates queries.
|
||||||
|
|
||||||
|
Instead, curation could merge the two records found to be on the same entity,
|
||||||
|
and retain only the already existing one, and therefore just one relevant PID.
|
||||||
|
Subsequently, all PID references of the duplicate record in the submission
|
||||||
|
could be replaced with this original PID.
|
||||||
|
|
||||||
|
Using a dedicated PID space for pre-curation PIDs, such as
|
||||||
|
`inm7:pending/<random-id>` can help the curation process by making them easier
|
||||||
|
to detect. Moreover, using random, auto-generated PIDs for new, pre-curation
|
||||||
|
records also eases the tasks for submitters. They do not have to learn and follow
|
||||||
|
possible rules for PID generations, such as using particular PID systems for certain
|
||||||
|
types of records (e.g., DOIs for publications, ORCID for researchers, ROR IDs for
|
||||||
|
organizations, RRIDs for resources, etc). This task could be left to professional
|
||||||
|
curators.
|
||||||
|
|
||||||
|
|
||||||
## Acknowledgements
|
## Acknowledgements
|
||||||
|
|
||||||
This work was funded by
|
This work was funded by the MKW-NRW: Ministerium für Kultur und Wissenschaft
|
||||||
|
des Landes Nordrhein-Westfalen under the Kooperationsplattformen 2022 program,
|
||||||
|
grant number: KP22-106A.
|
||||||
|
|
|
||||||
Loading…
Add table
Add a link
Reference in a new issue