How to connect a Person as an author to a Dataset? #13

Open
opened 2025-06-03 11:11:55 +00:00 by jsheunis · 4 comments
Owner

I remember that we have the Role class in TRR379: https://annotate.trr379.de/s/demo/?sh%3ANodeShape=dlroles%3ARole. In the TRR379 schema it's just an import of the roles schema from datalad-concepts: https://github.com/psychoinformatics-de/datalad-concepts/blob/main/src/roles/unreleased.yaml.

Should that be brought into the flat-data schema, or was a different approach envisioned?

I remember that we have the `Role` class in TRR379: https://annotate.trr379.de/s/demo/?sh%3ANodeShape=dlroles%3ARole. In the TRR379 schema it's just an import of the roles schema from datalad-concepts: https://github.com/psychoinformatics-de/datalad-concepts/blob/main/src/roles/unreleased.yaml. Should that be brought into the `flat-data` schema, or was a different approach envisioned?
Owner

Not having Role is essentially what makes the model "flat".

Not having Role is essentially what makes the model "flat".
Author
Owner

So an alternative would be to add something like an author property to the Dataset class. I'm guessing we want to be as hesitant as possible before just adding new properties though, in order to keep the class lean and more applicable to wider use cases?

So an alternative would be to add something like an `author` property to the `Dataset` class. I'm guessing we want to be as hesitant as possible before just adding new properties though, in order to keep the class lean and more applicable to wider use cases?
Owner

Looking at the INM-7 base schema I can see three ways the author can be connected to a dataset, and while one seems preferred, neither seems optimal.

attributed to

A Dataset has an attributed_to slot, which takes an Agent in its range. So this is an easy and direct link, but it lacks specificity (does not say that it is an author attribution).

The slot has a direct mapping to prov:wasAttributedTo which shows as an example that a drosophilaSample-84 wasAttributedTo lab-technician-FE-56. So it seems reasonable that a dataset would be attributed to an author.

qualified relations

A Dataset has a qualified_relations slot, which takes a Relationship in its range. The relationship has two slotsm object (Thing) & roles (Role). So this would allow us to be more specific about the role (by using relator terms: author, creator, curator, data contributor, dubious author...)

I believe the following (roughly - I'm typing directly into the markdown editor) would be valid for the INM-7 base schema:

pid: inm7/datasets/something
title: Some dataset
qualified_relations:
  - object: orcid:0000-0002-2771-9344
    roles:
      - marcrel:aut

This seems to be the preferred aproach. The major problem I have is that dlroles:qualified_relations declares a direct mapping to dcat:qualifiedRelation which is a "link to a description of a relationship with another resource". Indeed, DCAT seems to explicitly suggest using dcat:qualifiedAttribution instead to describe relationships between datasets and agents.

Unless I missed something, an equivalent of dcat:qualifiedAttribution is missing from the INM-7 concepts.

characterized by

A Dataset has a characterized_by slot, which takes a Statement in its range. The Statement could take prov:wasAttributedTo as a predicate, and the Author as an object, thus being effectively equivalent to the first option. It seems to be the least preferable option, because it lacks the directness of attributed_to and the specificity of qualified_relations.

However, if I'm seeing correctly, this is the only of the three slots currently exposed by https://annotate.inm7.de/s/data

Looking at the [INM-7 base schema](https://concepts.inm7.de/s/base/unreleased/) I can see three ways the author can be connected to a dataset, and while one seems preferred, neither seems optimal. ### attributed to A Dataset has an `attributed_to` slot, which takes an `Agent` in its range. So this is an easy and direct link, but it lacks specificity (does not say that it is an *author* attribution). The slot has a direct mapping to [prov:wasAttributedTo](https://www.w3.org/TR/prov-o/#wasAttributedTo) which shows as an example that a `drosophilaSample-84 wasAttributedTo lab-technician-FE-56`. So it seems reasonable that a dataset would be attributed to an author. ### qualified relations A Dataset has a `qualified_relations` slot, which takes a `Relationship` in its range. The relationship has two slotsm `object (Thing)` & `roles (Role)`. So this would allow us to be more specific about the role (by using [relator](https://id.loc.gov/vocabulary/relators.html) terms: author, creator, curator, data contributor, dubious author...) I believe the following (roughly - I'm typing directly into the markdown editor) would be valid for the INM-7 base schema: ```yaml pid: inm7/datasets/something title: Some dataset qualified_relations: - object: orcid:0000-0002-2771-9344 roles: - marcrel:aut ``` This seems to be the preferred aproach. The major problem I have is that `dlroles:qualified_relations` declares a direct mapping to [dcat:qualifiedRelation](https://www.w3.org/TR/vocab-dcat-3/#Property:resource_qualified_relation) which is a "link to a description of a relationship with another resource". Indeed, DCAT seems to explicitly suggest using [dcat:qualifiedAttribution](https://www.w3.org/TR/vocab-dcat-3/#Property:resource_qualified_attribution) instead to describe [relationships between datasets and agents](https://www.w3.org/TR/vocab-dcat-3/#qualified-attribution). Unless I missed something, an equivalent of `dcat:qualifiedAttribution` is missing from the INM-7 concepts. ### characterized by A Dataset has a `characterized_by` slot, which takes a `Statement` in its range. The Statement could take `prov:wasAttributedTo` as a predicate, and the Author as an object, thus being effectively equivalent to the first option. It seems to be the least preferable option, because it lacks the directness of `attributed_to` and the specificity of `qualified_relations`. However, if I'm seeing correctly, this is the only of the three slots currently exposed by https://annotate.inm7.de/s/data
Owner

In the "flat" models there must for a dedicated property for each role assignment. IOW there must be an additional author property. And an additional contributor property, etc etc.

All the other methods mentioned above are used in the model of the underlying graph, but require a "graph thinking" for entry also. That is why they are not used in the "flat" models.

Feel free to add as many properties as necessary to, for example, be able to form a complete datalad-catalog record.

In the "flat" models there must for a dedicated property for each role assignment. IOW there must be an additional `author` property. And an additional `contributor` property, etc etc. All the other methods mentioned above are used in the model of the underlying graph, but require a "graph thinking" for entry also. That is why they are not used in the "flat" models. Feel free to add as many properties as necessary to, for example, be able to form a complete datalad-catalog record.
Sign in to join this conversation.
No labels
No milestone
No project
No assignees
3 participants
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
inm7/annotate.inm7.de-data#13
No description provided.