shacl-vue form for creating a Neurobagel data dictionary #2

Open
opened 2024-09-27 08:56:36 +00:00 by jsheunis · 1 comment
jsheunis commented 2024-09-27 08:56:36 +00:00 (Migrated from hub.datalad.org)

As discussed in this issue, the shacl-vue route could be a means to create a neurobagel data dictionary. For this we would firstly need a LinkML schema, which we already have at https://hub.datalad.org/datalink/tools/src/branch/main/schemas/nbdd/unreleased.yaml.

For context, see https://hub.datalad.org/datalink/org/issues/3#issuecomment-68:

I started out with inspecting the neurobagel data dictionary jsonschema, then created a LinkML schema for it: https://github.com/jsheunis/datalad-concepts/blob/neurobagel/src/nbdd/unreleased.yaml

Notes about this LinkML schema:

  • The source of the DataDictionary schema does not specify a specific property for which a ContinuousColumn or CategoricalColumn can be added, it is put under additionalProperties. I could not find a straightforward way to translate this into the LinkML world, so I created a new slot for that, has_columns.

The goal of doing the above was not directly to solve this issue, just to explore. It led to some problems which were good in terms of UI testing for shacl-vue, specifically dealing with SHACL logical constraints, such as sh:or which is derived from LinkML any_of via jsonschema anyOf. The resulting form generated by shacl-vue can be used as an option to create data dictionaries.

I want to first ensure that the source of the data dictionary schema is actually the correct and up to date one used in the context of neurobagel (see https://github.com/neurobagel/annotation_tool/issues/796). When that is done, our existing LinkML version of it can be updated, and then improved to create a more intuitive form.

A remaining question is whether it would be needed to connect any of the terms collected in such a form to the schema that is used to describe the actual data that will end up in a neurobagel TSV file. Or rather, how to do it. The exact terms being described in the data dictionary, i.e. the column headers of the TSV file, are also slots that would need to be in the schema that describes the data. So it would be a kind of broken duplication if they have to be repeated on both ends, i.e. entered into the data dictionary form and included in the data schema. This might be an argument for not following the route suggested by this issue...

As discussed in [this issue](https://hub.datalad.org/datalink/tools/issues/1), the `shacl-vue` route could be a means to create a neurobagel data dictionary. For this we would firstly need a LinkML schema, which we already have at https://hub.datalad.org/datalink/tools/src/branch/main/schemas/nbdd/unreleased.yaml. For context, see https://hub.datalad.org/datalink/org/issues/3#issuecomment-68: > I started out with inspecting the neurobagel data dictionary jsonschema, then created a LinkML schema for it: https://github.com/jsheunis/datalad-concepts/blob/neurobagel/src/nbdd/unreleased.yaml > > Notes about this LinkML schema: > > - The [source of the DataDictionary schema](https://github.com/neurobagel/annotation_tool/blob/main/assets/neurobagel_data_dictionary.schema.json#L261) does not specify a specific property for which a ContinuousColumn or CategoricalColumn can be added, it is put under additionalProperties. I could not find a straightforward way to translate this into the LinkML world, so I created a new slot for that, has_columns. > > The goal of doing the above was not directly to solve this issue, just to explore. It led to some problems which were good in terms of UI testing for shacl-vue, specifically dealing with SHACL logical constraints, such as sh:or which is derived from LinkML any_of via jsonschema anyOf. The resulting form generated by shacl-vue can be used as an option to create data dictionaries. I want to first ensure that the source of the data dictionary schema is actually the correct and up to date one used in the context of neurobagel (see https://github.com/neurobagel/annotation_tool/issues/796). When that is done, our existing LinkML version of it can be updated, and then improved to create a more intuitive form. A remaining question is whether it would be needed to connect any of the terms collected in such a form to the schema that is used to describe the actual data that will end up in a neurobagel TSV file. Or rather, how to do it. The exact terms being described in the data dictionary, i.e. the column headers of the TSV file, are also slots that would need to be in the schema that describes the data. So it would be a kind of broken duplication if they have to be repeated on both ends, i.e. entered into the data dictionary form and included in the data schema. This might be an argument for not following the route suggested by this issue...
jsheunis commented 2024-10-09 09:21:33 +00:00 (Migrated from hub.datalad.org)

In https://github.com/neurobagel/annotation_tool/issues/796#issuecomment-2400117252 confirmed that this is the correct data dictionary schema, with the source being the pydantic classes.

In https://github.com/neurobagel/annotation_tool/issues/796#issuecomment-2400117252 confirmed that this is the correct data dictionary schema, with the source being the pydantic classes.
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
orinoco/tools#2
No description provided.