annotate.inm7.de-data

inm7/annotate.inm7.de-data

History

Stephan Heunis 043ac08931 v1 of the script to convert participants.tsv tables into flat-data		2025-07-06 21:34:29 +02:00
..
convert_bids.py	v1 of the script to convert participants.tsv tables into flat-data	2025-07-06 21:34:29 +02:00
meta.json	v1 of the script to convert participants.tsv tables into flat-data	2025-07-06 21:34:29 +02:00
README.md	v1 of the script to convert participants.tsv tables into flat-data	2025-07-06 21:34:29 +02:00
requirements.txt	v1 of the script to convert participants.tsv tables into flat-data	2025-07-06 21:34:29 +02:00

README.md

What's this?

A script, convert_bids.py, that can be used to convert participants.tsv files to the flat-data schema.

Prerequisites

Clone the repo:

git clone https://hub.psychoinformatics.de/inm7/annotate.inm7.de-data.git
cd annotate.inm7.de-data/tools

Create a virtual environment and install requirements:

python -m venv ~/my_env
source ~/my_env/bin/activate
pip install -r requirements.txt

Inputs

The script needs to be pointed to the INM-7 superdataset.

You have to:

clone it
install all subdatasets: datalad get . -r -n
get all participants.tsv files for the datasets that you want to convert

The script also needs to be pointed to a meta.json input file. This file provides required input data for the conversion process to the flat-data schema. The object in the meta.json has a datasets key, which is another object, that should be populated with info about all the datasets that are to be converted. Example structure:

{
    "datasets": {
        "<dataset-shortname>": {
            "path": "<path-to-bids-dataset-relative-to-super-root>",
            "name": "<human-readable-name-of-dataset>",
            "description": "<human-readable-name-of-dataset>",
            "dimensions": {
                "Age": {
                    "column": "<name-of-age-column-in-participants-table>",
                    "unit": "<year|month|day>"
                },
                "Sex": {
                    "column": "<name-of-sex-column-in-participants-table>",
                    "map": {
                        "F": "female",
                        "M": "male"
                        "<optional-different-F-level>": "female",
                        "<optional-different-M-level>": "male"
                    }
                }
            }
        },
        ...   
    },
    ...
}

This meta.json file included in this repository only has an empty object as the value for the datasets key.

Usage

This is the script help:

>> python convert_bids.py -h

usage: convert_bids.py [-h] [--namespace NAMESPACE] [--output OUTPUT] [--post] [--url URL] [--summary] dataset_path metadata_path

positional arguments:
  dataset_path          Path to INM7 superdataset
  metadata_path         Path to meta.json helper file

options:
  -h, --help            show this help message and exit
  --namespace NAMESPACE
                        Main namespace URL to be used for PIDs; defaults to 'https://inm7.de/ns/datamgt/'
  --output OUTPUT       Output file name, e.g. 'output.json'; file will be created in the `tools` directory next to this script; prints output to 'stdout'
                        by default
  --post                In addition to data conversion, also POST data to the backend; a base URL should be provided; the X_DUMPTHINGS_TOKEN token, if
                        required, should be saved to a '.env' file before running the script
  --url URL             Base URL for the backend
  --summary             Print a summary of the transformed metadata

Example:

python convert_bids.py --output flattened_bids.json --summary <path-to-superdataset> <path-to-meta-file>

To convert the data AND post the results to a dumpthings backend, the X_DUMPTHINGS_TOKEN should first be saved to an .env file in the same directory as the script.