Copy from hub.datalad/datalink/org: Construct a data dictionary for a neurobagel instance #15

Open
opened 2025-03-01 12:16:18 +00:00 by jsheunis · 0 comments
Owner

Source: https://hub.datalad.org/datalink/org/issues/3


Docs: https://neurobagel.org/dictionaries/
Example: https://github.com/neurobagel/neurobagel_examples/blob/main/data-upload/example_synthetic.json

A starting point could be the same as the example and would include the columns that neurobagel present as phenotypic attributes:

  • Participant identifier
  • Session identifier
  • Diagnosis / group
  • Sex
  • Age
  • Assessment tool

including a way to specify missing values. We can start by constructing a LinkML schema that cover these attributes and then try exporting this to jsonschema.

Note: This seems to be the jsonschema specification of neurobagel data dictionaries: https://github.com/neurobagel/annotation_tool/blob/main/assets/neurobagel_data_dictionary.schema.json


I started out with inspecting the neurobagel data dictionary jsonschema, then created a LinkML schema for it: https://github.com/jsheunis/datalad-concepts/blob/neurobagel/src/nbdd/unreleased.yaml

Notes about this LinkML schema:

  • The source of the DataDictionary schema does not specify a specific property for which a ContinuousColumn or CategoricalColumn can be added, it is put under additionalProperties. I could not find a straightforward way to translate this into the LinkML world, so I created a new slot for that, has_columns.

The goal of doing the above was not directly to solve this issue, just to explore. It led to some problems which were good in terms of UI testing for shacl-vue, specifically dealing with SHACL logical constraints, such as sh:or which is derived from LinkML any_of via jsonschema anyOf. The resulting form generated by shacl-vue can be used as an option to create data dictionaries.

However, this issue is actually about seeing of the terms and structure of a data dictionary can be built into a schema such that an export can be used directly as a data dictionary input to neurobagel. The current idea I have is to take the sddui schema as baseline, which was created as the first prototype schema for using with shacl-vue, and then add:

Source: https://hub.datalad.org/datalink/org/issues/3 --- Docs: https://neurobagel.org/dictionaries/ Example: https://github.com/neurobagel/neurobagel_examples/blob/main/data-upload/example_synthetic.json A starting point could be the same as the example and would include the columns that neurobagel present as phenotypic attributes: - Participant identifier - Session identifier - Diagnosis / group - Sex - Age - Assessment tool including a way to specify missing values. We can start by constructing a LinkML schema that cover these attributes and then try exporting this to jsonschema. Note: This seems to be the jsonschema specification of neurobagel data dictionaries: https://github.com/neurobagel/annotation_tool/blob/main/assets/neurobagel_data_dictionary.schema.json --- I started out with inspecting the neurobagel data dictionary jsonschema, then created a LinkML schema for it: https://github.com/jsheunis/datalad-concepts/blob/neurobagel/src/nbdd/unreleased.yaml Notes about this LinkML schema: - The [source of the `DataDictionary` schema](https://github.com/neurobagel/annotation_tool/blob/main/assets/neurobagel_data_dictionary.schema.json#L261) does not specify a specific property for which a `ContinuousColumn` or `CategoricalColumn` can be added, it is put under `additionalProperties`. I could not find a straightforward way to translate this into the LinkML world, so I created a new slot for that, `has_columns`. The goal of doing the above was not directly to solve this issue, just to explore. It led to some problems which were good in terms of UI testing for `shacl-vue`, specifically dealing with SHACL logical constraints, such as `sh:or` which is derived from LinkML `any_of` via jsonschema `anyOf`. The resulting form generated by `shacl-vue` can be used as an option to create data dictionaries. However, this issue is actually about seeing of the terms and structure of a data dictionary can be built into a schema such that an export can be used directly as a data dictionary input to neurobagel. The current idea I have is to take the [`sddui` schema](https://github.com/jsheunis/datalad-concepts/blob/annotations-etc/src/sddui/unreleased.yaml) as baseline, which was created as the first prototype schema for using with `shacl-vue`, and then add: - a few basic classes/slots that cover the entities required by neurobagel's phenotypic attributes, see: - https://neurobagel.org/dictionaries/ - https://neurobagel.org/data_prep/ - annotations or slots or whatever is useful for embedding the data dictionary information
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
orinoco/tools#15
No description provided.