adding author and contact to flat-data #87

Closed
adina wants to merge 27 commits from author into main
4 changed files with 162 additions and 0 deletions

View file

@ -25,6 +25,7 @@ prefixes:
dash: http://datashapes.org/dash#
dcterms: http://purl.org/dc/terms/
dlidentifiers: https://concepts.datalad.org/s/identifiers/unreleased/
dlprops: https://concepts.datalad.org/s/properties/unreleased/
dlschemas: https://concepts.datalad.org/s/
dlthings: https://concepts.datalad.org/s/things/v1/
dltypes: https://concepts.datalad.org/s/types/unreleased/
@ -65,6 +66,26 @@ imports:
- inm7schemas:flat-base/unreleased
slots:
about:
description: >-
A relation of an information artifact to the subject, such as a URL
identifying a home page.
range: string
access_methods:
description: >-
(Alternative) means to gain access to the subject.
multivalued: true
range: AccessMethod
authors:
description: >-
An entity responsible for making the resource.
multivalued: true
range: Person
exact_mappings:
- dcterms:creator
byte_size:
description: >-
The size of the subject in bytes.
@ -83,6 +104,13 @@ slots:
exact_mappings:
- spdx:checksum
contact:
description: >-
Whom to address for access requests.
range: Person
broad_mappings:
- dcterms:mediator
derived_from:
title: Derived from
description: >-
@ -102,6 +130,17 @@ slots:
broad_mappings:
- sio:SIO_000426
download_urls:
description: >-
URL that gives direct access to the subject in the form of a downloadable file in a given format.
range: uri
multivalued: true
exact_mappings:
- dcat:downloadURL
related_mappings:
- dcat:accessURL
- dcat:landingPage
factors:
title: Influencing factors
description: >-
@ -151,6 +190,13 @@ slots:
range: Instrument
multivalued: true
license:
description: A legal document under which the resource is made available.
range: string
exact_mappings:
- dcterms:license
- dcat:license
media_type:
description: >-
The media type of a distribution as defined by IANA
@ -210,6 +256,16 @@ slots:
classes:
AccessMethod:
class_uri: inm7fd:AccessMethod
description: >-
An approach or procedure to gain access to the subject.
comments:
- This is merely a base class for range declaration. It does not define any slots other than a type designator, because little or no commonalities of properties across access methods are to be expected.
slots:
- schema_type
Dataset:
class_uri: inm7fd:Dataset
is_a: Thing
@ -221,7 +277,12 @@ classes:
one representation, with differing schematic layouts, formats, and
serializations.
slots:
- about
- access_methods
- authors
- contact
- dimensions
- license
- name
- part_of
- primary_source
@ -253,6 +314,23 @@ classes:
recommended: true
annotations:
sh:order: 7
authors:
recommended: true
any_of:
- range: Person
- range: Organization
annotations:
sh:order: 8
contact:
annotations:
sh:order: 9
access_methods:
annotations:
sh:order: 10
license:
recommended: true
annotations:
sh:order: 11
DataItem:
class_uri: inm7fd:DataItem
@ -717,3 +795,29 @@ classes:
recommended: true
annotations:
sh:order: 4
AccessThroughLandingPage:
class_uri: inm7fd:AccessThroughLandingPage
is_a: AccessMethod
description: >-
Access to the subject through a web page, or information on a
web page, that can be navigated to in a Web browser.
slots:
- about
DirectDownload:
class_uri: inm7fd:DirectDownload
is_a: AccessMethod
description: >-
Direct access to the subject is possible via a download URL.
slots:
- download_urls
PersonalRequest:
class_uri: inm7fd:PersonalRequest
is_a: AccessMethod
description: >-
The act of personally making a request to get access to the subject
by following some described procedure.
slots:
- description

View file

@ -0,0 +1,31 @@
{
"pid": "inm7:dataset/demo",
"description": "A very cool dataset",
"schema_type": "inm7fd:Dataset",
"about": "https://my-awesome-project-homepage.com",
"access_methods": [
{
"schema_type": "inm7fd:DirectDownload",
"download_urls": [
"https://example.org/133a6ad2ca7ba8adb54b95ba204f20cfabfcba98?download"
]
},
{
"schema_type": "inm7fd:PersonalRequest",
"description": "Sacrifice at a full moon on the 5th Wednesday of a month"
},
{
"schema_type": "inm7fd:AccessThroughLandingPage",
"about": "Go to https://example.com and click that"
}
],
"authors": [
"inm7:users/jane-doe"
],
"contact": "inm7:users/john-doe",
"license": "CC-BY-4.0",
"name": "Demo Dataset",
"short_name": "demo",
"display_label": "demo",
"@type": "Dataset"
}

View file

@ -0,0 +1,18 @@
pid: inm7:dataset/demo
name: Demo Dataset
display_label: demo
short_name: demo
description: A very cool dataset
authors:
msz marked this conversation as resolved Outdated

Can it work like this? The author should be a Person, I believe inm7fb:Person specifically. As such, they would need pid (mandatory?), and optional family_name, given_name, etc. properties. There is no full name property, and I don't think linkml would be clever enough to cast the string value, Jane Doe into one of the properties?

Can it work like this? The author should be a Person, I believe `inm7fb:Person` specifically. As such, they would need `pid` (mandatory?), and optional `family_name`, `given_name`, etc. properties. There is no full name property, and I don't think linkml would be clever enough to cast the string value, `Jane Doe` into one of the properties?

Honestly, I don't know. My brain is in knots trying to figure out how this works, and I'm committing and pushing in hopes that someone sees stupid mistakes and I learn how its actually done.
Specifically with the examples I'm unsure how it works. I haven't figured out how to validate them with the Makefile, so I turned to linkml directly (thanks for the tip, @jsheunis), using

linkml validate -s src/flat-data/unreleased.yaml --config src/flat-data/unreleased/validation/Dataset.valid.cfg.yaml src/flat-data/unreleased/examples/Dataset-1.yaml

How would you do it?

Honestly, I don't know. My brain is in knots trying to figure out how this works, and I'm committing and pushing in hopes that someone sees stupid mistakes and I learn how its actually done. Specifically with the examples I'm unsure how it works. I haven't figured out how to validate them with the Makefile, so I turned to linkml directly (thanks for the tip, @jsheunis), using ``` linkml validate -s src/flat-data/unreleased.yaml --config src/flat-data/unreleased/validation/Dataset.valid.cfg.yaml src/flat-data/unreleased/examples/Dataset-1.yaml ``` How would you do it?

As I wrote, linkml validate yields "No issues found". However, in my poking, I learned to use linkml convert as another (stricter?) test, to see what comes out as ttl (I find it informative). And while linkml convert works for the Study-1.yaml example, it does not for Dataset-1.yaml.

Regardless of validation method - I would expect the author to be a Person object in the end. But can we define the Person in the same yaml? Should we define the person elsewhere and only include the PID? Are these things allowed in flat-schemas? What would shacl-vue do? TBH, I don't know either.

As I wrote, `linkml validate` yields "No issues found". However, in my poking, I learned to use `linkml convert` as another (stricter?) test, to see what comes out as ttl (I find it informative). And while `linkml convert` works for the `Study-1.yaml` example, it does not for `Dataset-1.yaml`. Regardless of validation method - I would expect the author to be a Person object in the end. But can we define the Person in the same yaml? Should we define the person elsewhere and only include the PID? Are these things allowed in flat-schemas? What would shacl-vue do? TBH, I don't know either.

We have since reached a conclusion: the person should be a class with slots, but in the example the person should be declared as CURIE (pid) to avoid inlining.

As far as I understand, we do not intend to do perform inlining with flat models because shacl-vue also works with CURIEs, and because inlining can open a world of pain in linkml.

The person can still be "defined" in the same file via the relations slot of the top-level class (in this case dataset), where we can declare its schema_type and relevant slots. This is the practice elsewhere in this project. However, even then, linkml would not check whether the CURIE identifies an object that matches the range (although we would expect it to do so). We decided not to worry about it too much.

We have since reached a conclusion: the person should be a class with slots, but in the example the person should be declared as CURIE (pid) to avoid inlining. As far as I understand, we do not intend to do perform inlining with flat models because shacl-vue also works with CURIEs, and because inlining can open a world of pain in linkml. The person can still be "defined" in the same file via the `relations` slot of the top-level class (in this case dataset), where we can declare its `schema_type` and relevant slots. This is the practice elsewhere in this project. However, even then, linkml would not check whether the CURIE identifies an object that matches the range (although we would expect it to do so). We decided not to worry about it too much.
- inm7:users/jane-doe
contact: inm7:users/john-doe
about: https://my-awesome-project-homepage.com
access_methods:
- schema_type: inm7fd:DirectDownload
download_urls:
- https://example.org/133a6ad2ca7ba8adb54b95ba204f20cfabfcba98?download
- schema_type: inm7fd:PersonalRequest
description: Sacrifice at a full moon on the 5th Wednesday of a month
- schema_type: inm7fd:AccessThroughLandingPage
about: Go to https://example.com and click that
license: CC-BY-4.0

View file

@ -0,0 +1,9 @@
schema: src/flat-data/unreleased.yaml
target_class: Dataset
data_sources:
- src/flat-data/unreleased/examples/Dataset-1.yaml
plugins:
JsonschemaValidationPlugin:
closed: true
include_range_class_descendants: false
RecommendedSlotsPlugin: