Copy from hub.datalad/datalink/org: Neurobagel integration notes #16
Labels
No labels
bug
duplicate
enhancement
help wanted
invalid
question
wontfix
No milestone
No project
No assignees
1 participant
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference
orinoco/tools#16
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Source: https://hub.datalad.org/datalink/org/issues/2
Notes: updates
The main reason for deploying a (variant of a) neurobagel instance is data discoverability. This is applicable for many scientific data use cases, but specifically for INM-7 and TRR379.
Neurobagel is a graph querying tool for building cohorts from multiple human neuroimaging datasets. It queries a metadata graph to find data at the subject level that have been annotated with terms according to a pre-specified data dictionary. If we define our own data dictionary with terms that we also derive from our own DataLad dataset model specification it would mean that queries can be run on a graph to put together a cohort for which a DataLad dataset could be generated programmatically on-demand, i.e. establishing a metadata query to actionable dataset pipeline. This means a secondary, but still huge IHMO, benefit of this would be dataset generation (in addition to discoverability).
Useful links:
Integration with the DataLad world
We have ongoing efforts related to metadata:
If we can:
we would have created a very capable and versatile framework.
Possible next steps
More notes:
Update 2024-09-11
In the end, our pipeline has to end up with data that would be valid for a neurobagel graph. See also: Neurobagel graph data files. Looking at thedocs and neurobagel examples, it looks like they generate this as
jsonld(e.g: example_synthetic.jsonld and example_synthetic_pheno-bids.jsonld ). The python-basedbagel-clican be used to generate graph-ready data.https://github.com/neurobagel/bagel-cli/blob/main/bagel/cli.py#L79-L85:
bagel-clihas a Pydantic implementation of the data dictionary schema as well as a somewhat implicit schema for datasets/subjects/samples/sessions/etc (I say implicit because I couldn't find this covered explicitly in the docs, but I could be wrong), and uses both these models/schemas to transform the provided participants TSV file and data dictionary into graph-ready data. The second schema is where the jsonld terms likehasSession,hasAcquisitionetc originate.So if we want our pipeline to end up with neurobagel-graph-ready data, our schema also needs to model these classes/terms and their relationships to other classes in our schema. If we don't do this, we would only need to model the data dictionary terms and the basic columns of neurobagel's TSV files (participant_is, session_id, etc) in order to generate the equivalent of neurobagel TSV files, and then we would need to depend on
bagel-clito convert this to graph-ready files.