adding author and contact to flat-data #87

Closed
adina wants to merge 27 commits from author into main
Owner

This PR is an effort towards inm7/annotate.inm7.de-data#24.
It adds a multi-valued slot "authors" that are constrained to be Persons or Organizations as well as a "contact" slot to the Dataset class in flat-data. Contact corresponds to "access_request_contact".

A question for @jsheunis: Given that "contact" needs to be a "Person", and a "Person" (from flat-base) has an email, is the "access_request_url" taken care of with this? Or is this connection (to their email) somehow lost in the flat schemas?

This PR is an effort towards inm7/annotate.inm7.de-data#24. It adds a multi-valued slot "authors" that are constrained to be Persons or Organizations as well as a "contact" slot to the Dataset class in flat-data. Contact corresponds to "access_request_contact". A question for @jsheunis: Given that "contact" needs to be a "Person", and a "Person" (from flat-base) has an email, is the "access_request_url" taken care of with this? Or is this connection (to their email) somehow lost in the flat schemas?
add authors to slots in Dataset
All checks were successful
Codespell / Check for spelling errors (pull_request) Successful in 25s
Model checks / lint (pull_request) Successful in 1m53s
Validate examples and verify unmodified conversion / lint (pull_request) Successful in 3m18s
6e365c2f1b
make range of author Person
All checks were successful
Codespell / Check for spelling errors (pull_request) Successful in 20s
Model checks / lint (pull_request) Successful in 1m43s
Validate examples and verify unmodified conversion / lint (pull_request) Successful in 2m50s
7c74c9ad4e
but it actually should be something like

    any_of:
      - range: Person
      - range: Organization
TMP: I don't think I need dlsocial
All checks were successful
Codespell / Check for spelling errors (pull_request) Successful in 20s
Model checks / lint (pull_request) Successful in 1m49s
Validate examples and verify unmodified conversion / lint (pull_request) Successful in 2m53s
369d60ee30
authors can be Persons or Orgs
All checks were successful
Codespell / Check for spelling errors (pull_request) Successful in 20s
Model checks / lint (pull_request) Successful in 1m45s
Validate examples and verify unmodified conversion / lint (pull_request) Successful in 2m54s
ee6202ff13
add contact to Dataset slots
All checks were successful
Codespell / Check for spelling errors (pull_request) Successful in 20s
Model checks / lint (pull_request) Successful in 1m49s
Validate examples and verify unmodified conversion / lint (pull_request) Successful in 3m13s
a3aff4d5a3
adina changed title from WIP: adding author to flat-data to adding author and contact to flat-data 2025-07-24 18:01:20 +00:00
extend example
All checks were successful
Codespell / Check for spelling errors (pull_request) Successful in 22s
Model checks / lint (pull_request) Successful in 1m39s
Validate examples and verify unmodified conversion / lint (pull_request) Successful in 2m52s
04575cf5f2
msz left a comment
Owner

I'm trying to analyze what was needed to add the author and all changes to unreleased.yaml seem logical, but I am not sure about the example (shouldn't the author/Person have properties, instead of just a string value?), see inline comment.

Validating the example on its own (hatch run check:linkml validate --config src/flat-data/unreleased/validation/Dataset.valid.cfg.yaml) yields "No issues found".

Running hatch run check:examples also succeeds validation, but fails conversion just before:

Converting src/flat-data/unreleased/examples/Dataset-1.yaml
(...)
ValueError: No such class: "None"
(...)
Validate src/flat-data/unreleased/validation/Dataset.valid.cfg.yaml
No issues found

I am trying to change the example and validate it with:

hatch run check:linkml convert -o ds.ttl -s src/flat-data/unreleased.yaml --target-class-from-path --infer src/flat-data/unreleased/examples/Dataset-1.yaml

However, I tried both this:

authors:
  - pid: inm7:persons/jdoe
    first_name: Jane
    last_name: Doe

and this:

authors:
  - inm7:persons/jdoe:
      first_name: Jane
      last_name: Doe

with and without schema_type: inm7fd:Person (got that idea from Study-1.yaml), and also without the pid, and I keep getting TypeError: vars() argument must have __dict__ attribute.

I'm trying to analyze what was needed to add the author and all changes to `unreleased.yaml` seem logical, but I am not sure about the example (shouldn't the author/Person have properties, instead of just a string value?), see inline comment. Validating the example on its own (`hatch run check:linkml validate --config src/flat-data/unreleased/validation/Dataset.valid.cfg.yaml`) yields "No issues found". Running `hatch run check:examples` also succeeds validation, but fails conversion just before: ``` Converting src/flat-data/unreleased/examples/Dataset-1.yaml (...) ValueError: No such class: "None" (...) Validate src/flat-data/unreleased/validation/Dataset.valid.cfg.yaml No issues found ``` I am trying to change the example and validate it with: ``` hatch run check:linkml convert -o ds.ttl -s src/flat-data/unreleased.yaml --target-class-from-path --infer src/flat-data/unreleased/examples/Dataset-1.yaml ``` However, I tried both this: ``` authors: - pid: inm7:persons/jdoe first_name: Jane last_name: Doe ``` and this: ``` authors: - inm7:persons/jdoe: first_name: Jane last_name: Doe ``` with and without `schema_type: inm7fd:Person` (got that idea from `Study-1.yaml`), and also without the `pid`, and I keep getting `TypeError: vars() argument must have __dict__ attribute`.
@ -0,0 +3,4 @@
display_label: demo
short_name: demo
description: A very cool dataset
authors: [Jane Doe]
Owner

Can it work like this? The author should be a Person, I believe inm7fb:Person specifically. As such, they would need pid (mandatory?), and optional family_name, given_name, etc. properties. There is no full name property, and I don't think linkml would be clever enough to cast the string value, Jane Doe into one of the properties?

Can it work like this? The author should be a Person, I believe `inm7fb:Person` specifically. As such, they would need `pid` (mandatory?), and optional `family_name`, `given_name`, etc. properties. There is no full name property, and I don't think linkml would be clever enough to cast the string value, `Jane Doe` into one of the properties?
Author
Owner

Honestly, I don't know. My brain is in knots trying to figure out how this works, and I'm committing and pushing in hopes that someone sees stupid mistakes and I learn how its actually done.
Specifically with the examples I'm unsure how it works. I haven't figured out how to validate them with the Makefile, so I turned to linkml directly (thanks for the tip, @jsheunis), using

linkml validate -s src/flat-data/unreleased.yaml --config src/flat-data/unreleased/validation/Dataset.valid.cfg.yaml src/flat-data/unreleased/examples/Dataset-1.yaml

How would you do it?

Honestly, I don't know. My brain is in knots trying to figure out how this works, and I'm committing and pushing in hopes that someone sees stupid mistakes and I learn how its actually done. Specifically with the examples I'm unsure how it works. I haven't figured out how to validate them with the Makefile, so I turned to linkml directly (thanks for the tip, @jsheunis), using ``` linkml validate -s src/flat-data/unreleased.yaml --config src/flat-data/unreleased/validation/Dataset.valid.cfg.yaml src/flat-data/unreleased/examples/Dataset-1.yaml ``` How would you do it?
Owner

As I wrote, linkml validate yields "No issues found". However, in my poking, I learned to use linkml convert as another (stricter?) test, to see what comes out as ttl (I find it informative). And while linkml convert works for the Study-1.yaml example, it does not for Dataset-1.yaml.

Regardless of validation method - I would expect the author to be a Person object in the end. But can we define the Person in the same yaml? Should we define the person elsewhere and only include the PID? Are these things allowed in flat-schemas? What would shacl-vue do? TBH, I don't know either.

As I wrote, `linkml validate` yields "No issues found". However, in my poking, I learned to use `linkml convert` as another (stricter?) test, to see what comes out as ttl (I find it informative). And while `linkml convert` works for the `Study-1.yaml` example, it does not for `Dataset-1.yaml`. Regardless of validation method - I would expect the author to be a Person object in the end. But can we define the Person in the same yaml? Should we define the person elsewhere and only include the PID? Are these things allowed in flat-schemas? What would shacl-vue do? TBH, I don't know either.
Owner

We have since reached a conclusion: the person should be a class with slots, but in the example the person should be declared as CURIE (pid) to avoid inlining.

As far as I understand, we do not intend to do perform inlining with flat models because shacl-vue also works with CURIEs, and because inlining can open a world of pain in linkml.

The person can still be "defined" in the same file via the relations slot of the top-level class (in this case dataset), where we can declare its schema_type and relevant slots. This is the practice elsewhere in this project. However, even then, linkml would not check whether the CURIE identifies an object that matches the range (although we would expect it to do so). We decided not to worry about it too much.

We have since reached a conclusion: the person should be a class with slots, but in the example the person should be declared as CURIE (pid) to avoid inlining. As far as I understand, we do not intend to do perform inlining with flat models because shacl-vue also works with CURIEs, and because inlining can open a world of pain in linkml. The person can still be "defined" in the same file via the `relations` slot of the top-level class (in this case dataset), where we can declare its `schema_type` and relevant slots. This is the practice elsewhere in this project. However, even then, linkml would not check whether the CURIE identifies an object that matches the range (although we would expect it to do so). We decided not to worry about it too much.
msz marked this conversation as resolved
Owner

I made some changes locally. I changed the flat-data/unreleased schema and also the Dataset-1.yaml example file. Here's the diff:

diff --git a/src/flat-data/unreleased.yaml b/src/flat-data/unreleased.yaml
index 9e12837..f74acf6 100644
--- a/src/flat-data/unreleased.yaml
+++ b/src/flat-data/unreleased.yaml
@@ -77,9 +77,7 @@ slots:
     description: >-
       An entity responsible for making the resource.
     multivalued: true
-    any_of:
-      - range: Person
-      - range: Organization
+    range: Person
     exact_mappings:
       - dcterms:creator
 
@@ -282,6 +280,10 @@ classes:
         annotations:
           sh:order: 7
       authors:
+        recommended: true
+        any_of:
+          - range: Person
+          - range: Organization
         annotations:
           sh:order: 8
       contact:
diff --git a/src/flat-data/unreleased/examples/Dataset-1.json b/src/flat-data/unreleased/examples/Dataset-1.json
index 046014e..fd00a01 100644
--- a/src/flat-data/unreleased/examples/Dataset-1.json
+++ b/src/flat-data/unreleased/examples/Dataset-1.json
@@ -2,13 +2,13 @@
   "pid": "inm7:dataset/demo",
   "description": "A very cool dataset",
   "schema_type": "inm7fd:Dataset",
+  "about": "https://my-awesome-project-homepage.com",
   "authors": [
-    "Jane Doe"
+    "inm7:users/jane-doe"
   ],
-  "contact": "John Doe",
+  "contact": "inm7:users/john-doe",
   "name": "Demo Dataset",
   "short_name": "demo",
   "display_label": "demo",
-  "@type": "Dataset",
-  "about": "https://my-awesome-project-website.com"
+  "@type": "Dataset"
 }
diff --git a/src/flat-data/unreleased/examples/Dataset-1.yaml b/src/flat-data/unreleased/examples/Dataset-1.yaml
index d82dc76..b101fd6 100644
--- a/src/flat-data/unreleased/examples/Dataset-1.yaml
+++ b/src/flat-data/unreleased/examples/Dataset-1.yaml
@@ -3,6 +3,7 @@ name: Demo Dataset
 display_label: demo
 short_name: demo
 description: A very cool dataset
-authors: [Jane Doe]
-contact: John Doe
+authors:
+  - inm7:users/jane-doe
+contact: inm7:users/john-doe
 about: https://my-awesome-project-homepage.com

Then I ran three make commands:

>> make checkmodel/flat-data/unreleased

[Check src/flat-data/unreleased.yaml]
Run linter
✓ No problems found
Generate a JSON-LD context
Generate JSON schema
Generate OWL
Generate Python classes

all good. and then:

>> make convertexamples/flat-data/unreleased

# loop over all examples, skip the schema file itself
for ex in src/flat-data/unreleased.yaml src/flat-data/unreleased/examples/*.yaml; do \
		[ "$ex" = "src/flat-data/unreleased.yaml" ] && continue; \
		echo "Converting $ex" ; \
		for outf in json rdf; do \
			linkml-convert \
				-s "src/flat-data/unreleased.yaml" \
				--target-class-from-path \
				--infer \
				-t "$outf" \
				"$ex" \
				> ${ex%.yaml}.${outf}.tmp && \
			mv ${ex%.yaml}.${outf}.tmp ${ex%.yaml}.${outf} ; \
		done \
	done
Converting src/flat-data/unreleased/examples/Dataset-1.yaml
Converting src/flat-data/unreleased/examples/Study-1.yaml
diff --git a/src/flat-data/unreleased/examples/Dataset-1.json b/src/flat-data/unreleased/examples/Dataset-1.json
index 046014e..fd00a01 100644
--- a/src/flat-data/unreleased/examples/Dataset-1.json
+++ b/src/flat-data/unreleased/examples/Dataset-1.json
@@ -2,13 +2,13 @@
   "pid": "inm7:dataset/demo",
   "description": "A very cool dataset",
   "schema_type": "inm7fd:Dataset",
+  "about": "https://my-awesome-project-homepage.com",
   "authors": [
-    "Jane Doe"
+    "inm7:users/jane-doe"
   ],
-  "contact": "John Doe",
+  "contact": "inm7:users/john-doe",
   "name": "Demo Dataset",
   "short_name": "demo",
   "display_label": "demo",
-  "@type": "Dataset",
-  "about": "https://my-awesome-project-website.com"
+  "@type": "Dataset"
 }
diff --git a/src/flat-data/unreleased/examples/Dataset-1.yaml b/src/flat-data/unreleased/examples/Dataset-1.yaml
index d82dc76..b101fd6 100644
--- a/src/flat-data/unreleased/examples/Dataset-1.yaml
+++ b/src/flat-data/unreleased/examples/Dataset-1.yaml
@@ -3,6 +3,7 @@ name: Demo Dataset
 display_label: demo
 short_name: demo
 description: A very cool dataset
-authors: [Jane Doe]
-contact: John Doe
+authors:
+  - inm7:users/jane-doe
+contact: inm7:users/john-doe
 about: https://my-awesome-project-homepage.com
-n ERROR: Unexpected modification of example output.
Inspect and commit changes shown above!
make: *** [convertexamples/flat-data/unreleased] Error 22

This command also resulted in the diff of the Dataset-1.json file that you see in the diff above. The reason for the "ERROR" is just because the json file was different before running the command vs after running the command. And this is expected, because beforehand it was wrong because it was still based on the older schema/data.

Then:

>> make checkvalidation/flat-data/unreleased

/Library/Developer/CommandLineTools/usr/bin/make checkvalid/flat-data/unreleased checkinvalid/flat-data/unreleased
Validate src/flat-data/unreleased/validation/Dataset.valid.cfg.yaml
No issues found
Validate src/flat-data/unreleased/validation/Study.valid.cfg.yaml
[WARN] [src/flat-data/unreleased/examples/Study-1.yaml/0] Slot 'display_label' is recommended on class 'Study' in /
(In)validate src/flat-data/unreleased/validation/*.invalid.cfg.yaml
Usage: linkml-validate [OPTIONS] [DATA_SOURCES]...
Try 'linkml-validate --help' for help.

Error: Invalid value for '--config': File 'src/flat-data/unreleased/validation/*.invalid.cfg.yaml' does not exist.

The warning is expected; and the error is also expected, it has nothing to do with the model/data, just with the fact that there is no *.invalid.cfg.yaml to test.

So all in all this could work. The only drawback is that we can't specifically include inlining / inlining as list for text-based data submission purposes, because that would break the current shacl-vue-based paradigm where we specifically want a URI as the value for an author. That is, IIUC...

I made some changes locally. I changed the `flat-data/unreleased` schema and also the `Dataset-1.yaml` example file. Here's the diff: ```diff diff --git a/src/flat-data/unreleased.yaml b/src/flat-data/unreleased.yaml index 9e12837..f74acf6 100644 --- a/src/flat-data/unreleased.yaml +++ b/src/flat-data/unreleased.yaml @@ -77,9 +77,7 @@ slots: description: >- An entity responsible for making the resource. multivalued: true - any_of: - - range: Person - - range: Organization + range: Person exact_mappings: - dcterms:creator @@ -282,6 +280,10 @@ classes: annotations: sh:order: 7 authors: + recommended: true + any_of: + - range: Person + - range: Organization annotations: sh:order: 8 contact: diff --git a/src/flat-data/unreleased/examples/Dataset-1.json b/src/flat-data/unreleased/examples/Dataset-1.json index 046014e..fd00a01 100644 --- a/src/flat-data/unreleased/examples/Dataset-1.json +++ b/src/flat-data/unreleased/examples/Dataset-1.json @@ -2,13 +2,13 @@ "pid": "inm7:dataset/demo", "description": "A very cool dataset", "schema_type": "inm7fd:Dataset", + "about": "https://my-awesome-project-homepage.com", "authors": [ - "Jane Doe" + "inm7:users/jane-doe" ], - "contact": "John Doe", + "contact": "inm7:users/john-doe", "name": "Demo Dataset", "short_name": "demo", "display_label": "demo", - "@type": "Dataset", - "about": "https://my-awesome-project-website.com" + "@type": "Dataset" } diff --git a/src/flat-data/unreleased/examples/Dataset-1.yaml b/src/flat-data/unreleased/examples/Dataset-1.yaml index d82dc76..b101fd6 100644 --- a/src/flat-data/unreleased/examples/Dataset-1.yaml +++ b/src/flat-data/unreleased/examples/Dataset-1.yaml @@ -3,6 +3,7 @@ name: Demo Dataset display_label: demo short_name: demo description: A very cool dataset -authors: [Jane Doe] -contact: John Doe +authors: + - inm7:users/jane-doe +contact: inm7:users/john-doe about: https://my-awesome-project-homepage.com ``` Then I ran three `make` commands: ``` >> make checkmodel/flat-data/unreleased [Check src/flat-data/unreleased.yaml] Run linter ✓ No problems found Generate a JSON-LD context Generate JSON schema Generate OWL Generate Python classes ``` all good. and then: ``` >> make convertexamples/flat-data/unreleased # loop over all examples, skip the schema file itself for ex in src/flat-data/unreleased.yaml src/flat-data/unreleased/examples/*.yaml; do \ [ "$ex" = "src/flat-data/unreleased.yaml" ] && continue; \ echo "Converting $ex" ; \ for outf in json rdf; do \ linkml-convert \ -s "src/flat-data/unreleased.yaml" \ --target-class-from-path \ --infer \ -t "$outf" \ "$ex" \ > ${ex%.yaml}.${outf}.tmp && \ mv ${ex%.yaml}.${outf}.tmp ${ex%.yaml}.${outf} ; \ done \ done Converting src/flat-data/unreleased/examples/Dataset-1.yaml Converting src/flat-data/unreleased/examples/Study-1.yaml diff --git a/src/flat-data/unreleased/examples/Dataset-1.json b/src/flat-data/unreleased/examples/Dataset-1.json index 046014e..fd00a01 100644 --- a/src/flat-data/unreleased/examples/Dataset-1.json +++ b/src/flat-data/unreleased/examples/Dataset-1.json @@ -2,13 +2,13 @@ "pid": "inm7:dataset/demo", "description": "A very cool dataset", "schema_type": "inm7fd:Dataset", + "about": "https://my-awesome-project-homepage.com", "authors": [ - "Jane Doe" + "inm7:users/jane-doe" ], - "contact": "John Doe", + "contact": "inm7:users/john-doe", "name": "Demo Dataset", "short_name": "demo", "display_label": "demo", - "@type": "Dataset", - "about": "https://my-awesome-project-website.com" + "@type": "Dataset" } diff --git a/src/flat-data/unreleased/examples/Dataset-1.yaml b/src/flat-data/unreleased/examples/Dataset-1.yaml index d82dc76..b101fd6 100644 --- a/src/flat-data/unreleased/examples/Dataset-1.yaml +++ b/src/flat-data/unreleased/examples/Dataset-1.yaml @@ -3,6 +3,7 @@ name: Demo Dataset display_label: demo short_name: demo description: A very cool dataset -authors: [Jane Doe] -contact: John Doe +authors: + - inm7:users/jane-doe +contact: inm7:users/john-doe about: https://my-awesome-project-homepage.com -n ERROR: Unexpected modification of example output. Inspect and commit changes shown above! make: *** [convertexamples/flat-data/unreleased] Error 22 ``` This command also resulted in the diff of the `Dataset-1.json` file that you see in the diff above. The reason for the "ERROR" is just because the json file was different before running the command vs after running the command. And this is expected, because beforehand it was wrong because it was still based on the older schema/data. Then: ``` >> make checkvalidation/flat-data/unreleased /Library/Developer/CommandLineTools/usr/bin/make checkvalid/flat-data/unreleased checkinvalid/flat-data/unreleased Validate src/flat-data/unreleased/validation/Dataset.valid.cfg.yaml No issues found Validate src/flat-data/unreleased/validation/Study.valid.cfg.yaml [WARN] [src/flat-data/unreleased/examples/Study-1.yaml/0] Slot 'display_label' is recommended on class 'Study' in / (In)validate src/flat-data/unreleased/validation/*.invalid.cfg.yaml Usage: linkml-validate [OPTIONS] [DATA_SOURCES]... Try 'linkml-validate --help' for help. Error: Invalid value for '--config': File 'src/flat-data/unreleased/validation/*.invalid.cfg.yaml' does not exist. ``` The warning is expected; and the error is also expected, it has nothing to do with the model/data, just with the fact that there is no `*.invalid.cfg.yaml` to test. So all in all this could work. The only drawback is that we can't specifically include inlining / inlining as list for text-based data submission purposes, because that would break the current `shacl-vue`-based paradigm where we specifically want a URI as the value for an `author`. That is, IIUC...
Fix Dataset examples
All checks were successful
Codespell / Check for spelling errors (pull_request) Successful in 26s
Model checks / lint (pull_request) Successful in 1m37s
Validate examples and verify unmodified conversion / lint (pull_request) Successful in 2m57s
27804f3470
Author
Owner

Thanks much @msz and @jsheunis! I have added your patch (under your committer ID).

Thanks much @msz and @jsheunis! I have added your patch (under your committer ID).
Author
Owner

@mih before this PR gets any longer, would you take a look at it, too, please?

@mih before this PR gets any longer, would you take a look at it, too, please?
Author
Owner

Notes from impromptu meeting in the office hour:

  • access_request_url is not an email, its an alternative to access_request_contact (which is a person with an email).
  • We keep contact because it is generally useful
  • We further implement something like https://concepts.datalad.org/s/resources/unreleased/AccessMethod/ but not necessarily import it (because it is quite large). Maybe make it an enum with relevant access methods for flat-data usecases. Check out how free-text-form for "gender" in DFG collection is done.
Notes from impromptu meeting in the office hour: - access_request_url is not an email, its an alternative to access_request_contact (which is a person with an email). - We keep contact because it is generally useful - We further implement something like https://concepts.datalad.org/s/resources/unreleased/AccessMethod/ but not necessarily import it (because it is quite large). Maybe make it an enum with relevant access methods for flat-data usecases. Check out how free-text-form for "gender" in DFG collection is done.
it was a left-over from copy-pasting
Add PersonalRequest access method
All checks were successful
Codespell / Check for spelling errors (pull_request) Successful in 20s
Model checks / lint (pull_request) Successful in 1m37s
Validate examples and verify unmodified conversion / lint (pull_request) Successful in 2m57s
74b9939a83
copy two more access methods from edistribution
Some checks failed
Codespell / Check for spelling errors (pull_request) Successful in 20s
Validate examples and verify unmodified conversion / lint (pull_request) Failing after 16m6s
Model checks / lint (pull_request) Failing after 16m8s
ab1329b566
Author
Owner

I added AccessMethod from datalad-concepts just now. I looked through the ABCD-J catalog, and the only means of "access" I was able to distinguish were "data is available right away" and "email this particular person". The latter I would translate to "PersonalRequest", the former does not sound like it needs anything. I nevertheless added "direct download" and "access through landing page" to the flat data schema (though because I thought it makes sense for other flat-data usecases than the catalog).

Two questions to @jsheunis:

  • did I miss access methods the catalog distinguishes?
  • does the case of "data is freely available" need a dedicated access method?
I added AccessMethod from datalad-concepts just now. I looked through the ABCD-J catalog, and the only means of "access" I was able to distinguish were "data is available right away" and "email this particular person". The latter I would translate to "PersonalRequest", the former does not sound like it needs anything. I nevertheless added "direct download" and "access through landing page" to the flat data schema (though because I thought it makes sense for other flat-data usecases than the catalog). Two questions to @jsheunis: - did I miss access methods the catalog distinguishes? - does the case of "data is freely available" need a dedicated access method?
add access method slot to Dataset class
Some checks failed
Codespell / Check for spelling errors (pull_request) Failing after 1m2s
Validate examples and verify unmodified conversion / lint (pull_request) Failing after 11m46s
Model checks / lint (pull_request) Failing after 11m48s
c6f90e869e
add landing page access method to dataset example
Some checks failed
Codespell / Check for spelling errors (pull_request) Successful in 20s
Model checks / lint (pull_request) Successful in 1m43s
Validate examples and verify unmodified conversion / lint (pull_request) Failing after 2m26s
0e0974f97e
Owner

did I miss access methods the catalog distinguishes?

The catalog also has access_request_url which basically translates to "follow this link and read what the site says in order to determine next steps for requesting/gaining access". I am not sure how this would translate to an AccessMethod though, because ideally the entered access method would be more specific. Perhaps the access_request_url just becomes redundant, or it could be used for cases where a particular AccessMethod actually has a property that is similar to access_request_url, e.g. something like an ElectronicForm access method, i.e. "complete this online form to request access".

does the case of "data is freely available" need a dedicated access method?

I think those are related but different concepts: whether data is freely available (just a general statement about the data) and how data is freely available (i.e. AccessMethod). If it is freely available via direct http-based download, then the access method would be exactly that.

> did I miss access methods the catalog distinguishes? The catalog also has `access_request_url` which basically translates to "follow this link and read what the site says in order to determine next steps for requesting/gaining access". I am not sure how this would translate to an `AccessMethod` though, because ideally the entered access method would be more specific. Perhaps the `access_request_url` just becomes redundant, or it could be used for cases where a particular `AccessMethod` actually has a property that is similar to `access_request_url`, e.g. something like an `ElectronicForm` access method, i.e. "complete this online form to request access". > does the case of "data is freely available" need a dedicated access method? I think those are related but different concepts: whether data is freely available (just a general statement about the data) and *how* data is freely available (i.e. `AccessMethod`). If it is freely available via direct http-based download, then the access method would be exactly that.
Author
Owner

Some thoughts on "License": datalad concepts' Distribution has slots and classes for licenses. But I believe they are very "big". In datalad concepts, the license slots is of range "LicenseDocument", which is an entity. I believe we wouldn't want to copy this in flat data because it is too heavy. At the moment, I have it minimally constrained to "string"...

Some thoughts on "License": datalad concepts' Distribution has slots and classes for licenses. But I believe they are very "big". In datalad concepts, the license slots is of range "LicenseDocument", which is an entity. I believe we wouldn't want to copy this in flat data because it is too heavy. At the moment, I have it minimally constrained to "string"...
add a license slot to Dataset
Some checks failed
Codespell / Check for spelling errors (pull_request) Successful in 19s
Model checks / lint (pull_request) Successful in 1m39s
Validate examples and verify unmodified conversion / lint (pull_request) Failing after 2m20s
4fd115c277
commit example conversion
Some checks failed
Codespell / Check for spelling errors (pull_request) Successful in 21s
Model checks / lint (pull_request) Successful in 1m38s
Validate examples and verify unmodified conversion / lint (pull_request) Failing after 17m2s
a890e4caee
Owner

@adina wrote in #87 (comment):

Some thoughts on "License": datalad concepts' Distribution has slots and classes for licenses. But I believe they are very "big". In datalad concepts, the license slots is of range "LicenseDocument", which is an entity. I believe we wouldn't want to copy this in flat data because it is too heavy. At the moment, I have it minimally constrained to "string"...

In Catalog though, the license has name and url (schema definition) - if both are available, the catalog shows the license name as a link to the url. I wonder if that warrants having a dedicated class for a license with matching slots.

Maybe name (name for catalog, name for concepts Entity/Thing), url (url for catalog, identifier for concepts Entity), license-text (the only specific slot LicenseDocument has, and the most optional in this context)?

@adina wrote in https://hub.psychoinformatics.de/inm7/inm7-concepts/pulls/87#issuecomment-4465: > Some thoughts on "License": datalad concepts' Distribution has slots and classes for licenses. But I believe they are very "big". In datalad concepts, the license slots is of range "LicenseDocument", which is an entity. I believe we wouldn't want to copy this in flat data because it is too heavy. At the moment, I have it minimally constrained to "string"... In Catalog though, the license has name and url ([schema definition](https://github.com/datalad/datalad-catalog/blob/c911350089617e47fd480b443b9869a830e5a1f2/datalad_catalog/catalog/schema/jsonschema_dataset.json#L73-L86)) - if both are available, the catalog shows the license name as a link to the url. I wonder if that warrants having a dedicated class for a license with matching slots. Maybe `name` (name for catalog, name for concepts Entity/Thing), `url` (url for catalog, identifier for concepts Entity), `license-text` (the only specific slot LicenseDocument has, and the most optional in this context)?
Author
Owner

this PR has gotten stale - my apologies I didn't follow through with it.
It reminds me that there is still an ongoing todo to represent datalad-catalog metadata in our schema to be able to "translate" datasets from the catalog (e.g., from ABCD-J) to a knowledge pool. This PR clearly didn't accomplish that - I'll make sure to cross-link the original issue in this repo again.

this PR has gotten stale - my apologies I didn't follow through with it. It reminds me that there is still an ongoing todo to represent datalad-catalog metadata in our schema to be able to "translate" datasets from the catalog (e.g., from ABCD-J) to a knowledge pool. This PR clearly didn't accomplish that - I'll make sure to cross-link the original issue in this repo again.
adina closed this pull request 2026-04-24 11:59:30 +00:00
Some checks failed
Codespell / Check for spelling errors (pull_request) Successful in 21s
Model checks / lint (pull_request) Successful in 1m38s
Validate examples and verify unmodified conversion / lint (pull_request) Failing after 17m2s

Pull request closed

Sign in to join this conversation.
No description provided.