Update data dictionary generation process #1

Open
opened 2024-09-27 08:27:52 +00:00 by jsheunis · 0 comments
jsheunis commented 2024-09-27 08:27:52 +00:00 (Migrated from hub.datalad.org)

The background: https://hub.datalad.org/datalink/org/issues/3

The current state: https://hub.datalad.org/datalink/org/issues/3

During discussion with @mih, the following points were made:

  • The current approach (adding neurobagel annotations to an existing LinkML schema) requires an understanding of LinkML internals for a maintainer to be able to create/update a data dictionary, ideally this should not be a requirement.

  • In this comment I mentioned:

    I started out with inspecting the neurobagel data dictionary jsonschema, then created a LinkML schema for it: https://github.com/jsheunis/datalad-concepts/blob/neurobagel/src/nbdd/unreleased.yaml

    This schema is now located in the current repository under https://hub.datalad.org/datalink/tools/src/branch/main/schemas/nbdd/unreleased.yaml. Generating shacl shapes from this schema and subsequently generating a shacl-vue form could be an appropriate alternative to the current way of generating a data dictionary. The maintainer would have to use the form to construct the data dictionary and then download it once.

  • Another alternative could be to use the current approach of embedding the Neurobagel-required info into the LinkML schema, but in a more logical way using existing concepts rather than a string-encoded annotation. The dlco already has classes for a dlthing:Property and a dlthing:QuantitativeProperty that could be used in an improved approach to generate a Neurobagel data dictionary from a dlco-based schema. The exact link of these with neurobagel terms and how this would influence implementation are still a bit hazy to me.

The first question is: do we want a maintainer to fill in a form or to update a LinkML schema in order to generate the data dictionary?

The background: https://hub.datalad.org/datalink/org/issues/3 The current state: [https://hub.datalad.org/datalink/org/issues/3](https://hub.datalad.org/datalink/tools#generating-a-data-dictionary-for-neurobagel-https-neurobagel-org-from-a-linkml-schema) During discussion with @mih, the following points were made: - The current approach (adding neurobagel annotations to an existing LinkML schema) requires an understanding of LinkML internals for a maintainer to be able to create/update a data dictionary, ideally this should not be a requirement. - In [this comment](https://hub.datalad.org/datalink/org/issues/3#issuecomment-68) I mentioned: > I started out with inspecting the neurobagel data dictionary jsonschema, then created a LinkML schema for it: https://github.com/jsheunis/datalad-concepts/blob/neurobagel/src/nbdd/unreleased.yaml This schema is now located in the current repository under https://hub.datalad.org/datalink/tools/src/branch/main/schemas/nbdd/unreleased.yaml. Generating shacl shapes from this schema and subsequently generating a `shacl-vue` form could be an appropriate alternative to the current way of generating a data dictionary. The maintainer would have to use the form to construct the data dictionary and then download it once. - Another alternative could be to use the current approach of embedding the Neurobagel-required info into the LinkML schema, but in a more logical way using existing concepts rather than a string-encoded annotation. The `dlco` already has classes for a [`dlthing:Property`](https://github.com/psychoinformatics-de/datalad-concepts/blob/8ef7b6c46777669188f6a1e7273a35ff5ea1e497/src/thing/unreleased.yaml#L315) and a [`dlthing:QuantitativeProperty`](https://github.com/psychoinformatics-de/datalad-concepts/blob/8ef7b6c46777669188f6a1e7273a35ff5ea1e497/src/thing/unreleased.yaml#L326) that could be used in an improved approach to generate a Neurobagel data dictionary from a `dlco`-based schema. The exact link of these with neurobagel terms and how this would influence implementation are still a bit hazy to me. The first question is: do we want a maintainer to fill in a form or to update a LinkML schema in order to generate the data dictionary?
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
orinoco/tools#1
No description provided.