Can the Neurobagel data structure and query interface be customized (or how complicated would it be to do so?) #5
Labels
No labels
bug
duplicate
enhancement
help wanted
invalid
question
wontfix
No milestone
No project
No assignees
1 participant
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference
orinoco/tools#5
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
A question that has come up in discussion is:
The data dictionary defines the semantic annotations of the columns in the Neurobagel TSV file, so my understanding is that we could technically include any arbitrary columns and annotations as long as we stick to the data dictionary specification (i.e. only categorical columns, continuous columns, or identifier columns). What I am not sure about is the nature of the identifier columns. From my understanding of the docs about the Neurobagel TSV file, rows are equivalent to "particiant-sessions", i.e. there are only two identifier columns (
Identifies: participantandIdentifies: session). Is this a hard requirement forbagel-cliand the query tool? Or can we include an arbitrary amount of identifier columns (a single one, or many)? If possible, how will the query interface deal with this? Automatically, or will it need development to deal with the changes? I assume that e.g.Identifies: participanthas some internal mapping used in the process of generating graph-ready data, so if we e.g. sayIdentifies: sampleorIdentifies: cuteLittlePuppythe process will fail?As noted at the end of this comment https://hub.datalad.org/datalink/org/issues/2#issue-21, my understanding is that neurobagel has its own internal schema for subjects, sessions, images, etc., which I assume follows BIDS to a major extent. I understand that the
bagel-clican be used to generate phenotypic-only graph-ready data, i.e. a BIDS dataset does not have to accompany the process. But what happens if we still have an accompanying scientific dataset that does not conform to BIDS but we still want to make some/all of its aspects/content findable in neurobagel node via the query interface. E.g. DNA sequencing or flow cytometry data. Some aspects might be able to be mapped onto the "TSV-file/data-dictionary" paradigm as new columns, but others not.So in summary, will neurobagel components be able to deal with this. If not out of the box, how complicated would it be to be customized? Or would it not be customizable at all?
Note: issue repeated here: https://github.com/neurobagel/query-tool/issues/307