adding author and contact to flat-data #87
No reviewers
Labels
No labels
bug
documentation
duplicate
enhancement
good first issue
help wanted
invalid
question
wontfix
No milestone
No project
No assignees
3 participants
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference
inm7/inm7-concepts!87
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "author"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
This PR is an effort towards inm7/annotate.inm7.de-data#24.
It adds a multi-valued slot "authors" that are constrained to be Persons or Organizations as well as a "contact" slot to the Dataset class in flat-data. Contact corresponds to "access_request_contact".
A question for @jsheunis: Given that "contact" needs to be a "Person", and a "Person" (from flat-base) has an email, is the "access_request_url" taken care of with this? Or is this connection (to their email) somehow lost in the flat schemas?
but it actually should be something like any_of: - range: Person - range: OrganizationWIP: adding author to flat-datato adding author and contact to flat-datadatalad-catalogentities not covered by the flat schemas #24I'm trying to analyze what was needed to add the author and all changes to
unreleased.yamlseem logical, but I am not sure about the example (shouldn't the author/Person have properties, instead of just a string value?), see inline comment.Validating the example on its own (
hatch run check:linkml validate --config src/flat-data/unreleased/validation/Dataset.valid.cfg.yaml) yields "No issues found".Running
hatch run check:examplesalso succeeds validation, but fails conversion just before:I am trying to change the example and validate it with:
However, I tried both this:
and this:
with and without
schema_type: inm7fd:Person(got that idea fromStudy-1.yaml), and also without thepid, and I keep gettingTypeError: vars() argument must have __dict__ attribute.@ -0,0 +3,4 @@display_label: demoshort_name: demodescription: A very cool datasetauthors: [Jane Doe]Can it work like this? The author should be a Person, I believe
inm7fb:Personspecifically. As such, they would needpid(mandatory?), and optionalfamily_name,given_name, etc. properties. There is no full name property, and I don't think linkml would be clever enough to cast the string value,Jane Doeinto one of the properties?Honestly, I don't know. My brain is in knots trying to figure out how this works, and I'm committing and pushing in hopes that someone sees stupid mistakes and I learn how its actually done.
Specifically with the examples I'm unsure how it works. I haven't figured out how to validate them with the Makefile, so I turned to linkml directly (thanks for the tip, @jsheunis), using
How would you do it?
As I wrote,
linkml validateyields "No issues found". However, in my poking, I learned to uselinkml convertas another (stricter?) test, to see what comes out as ttl (I find it informative). And whilelinkml convertworks for theStudy-1.yamlexample, it does not forDataset-1.yaml.Regardless of validation method - I would expect the author to be a Person object in the end. But can we define the Person in the same yaml? Should we define the person elsewhere and only include the PID? Are these things allowed in flat-schemas? What would shacl-vue do? TBH, I don't know either.
We have since reached a conclusion: the person should be a class with slots, but in the example the person should be declared as CURIE (pid) to avoid inlining.
As far as I understand, we do not intend to do perform inlining with flat models because shacl-vue also works with CURIEs, and because inlining can open a world of pain in linkml.
The person can still be "defined" in the same file via the
relationsslot of the top-level class (in this case dataset), where we can declare itsschema_typeand relevant slots. This is the practice elsewhere in this project. However, even then, linkml would not check whether the CURIE identifies an object that matches the range (although we would expect it to do so). We decided not to worry about it too much.I made some changes locally. I changed the
flat-data/unreleasedschema and also theDataset-1.yamlexample file. Here's the diff:Then I ran three
makecommands:all good. and then:
This command also resulted in the diff of the
Dataset-1.jsonfile that you see in the diff above. The reason for the "ERROR" is just because the json file was different before running the command vs after running the command. And this is expected, because beforehand it was wrong because it was still based on the older schema/data.Then:
The warning is expected; and the error is also expected, it has nothing to do with the model/data, just with the fact that there is no
*.invalid.cfg.yamlto test.So all in all this could work. The only drawback is that we can't specifically include inlining / inlining as list for text-based data submission purposes, because that would break the current
shacl-vue-based paradigm where we specifically want a URI as the value for anauthor. That is, IIUC...Thanks much @msz and @jsheunis! I have added your patch (under your committer ID).
@mih before this PR gets any longer, would you take a look at it, too, please?
Notes from impromptu meeting in the office hour:
I added AccessMethod from datalad-concepts just now. I looked through the ABCD-J catalog, and the only means of "access" I was able to distinguish were "data is available right away" and "email this particular person". The latter I would translate to "PersonalRequest", the former does not sound like it needs anything. I nevertheless added "direct download" and "access through landing page" to the flat data schema (though because I thought it makes sense for other flat-data usecases than the catalog).
Two questions to @jsheunis:
The catalog also has
access_request_urlwhich basically translates to "follow this link and read what the site says in order to determine next steps for requesting/gaining access". I am not sure how this would translate to anAccessMethodthough, because ideally the entered access method would be more specific. Perhaps theaccess_request_urljust becomes redundant, or it could be used for cases where a particularAccessMethodactually has a property that is similar toaccess_request_url, e.g. something like anElectronicFormaccess method, i.e. "complete this online form to request access".I think those are related but different concepts: whether data is freely available (just a general statement about the data) and how data is freely available (i.e.
AccessMethod). If it is freely available via direct http-based download, then the access method would be exactly that.Some thoughts on "License": datalad concepts' Distribution has slots and classes for licenses. But I believe they are very "big". In datalad concepts, the license slots is of range "LicenseDocument", which is an entity. I believe we wouldn't want to copy this in flat data because it is too heavy. At the moment, I have it minimally constrained to "string"...
@adina wrote in #87 (comment):
In Catalog though, the license has name and url (schema definition) - if both are available, the catalog shows the license name as a link to the url. I wonder if that warrants having a dedicated class for a license with matching slots.
Maybe
name(name for catalog, name for concepts Entity/Thing),url(url for catalog, identifier for concepts Entity),license-text(the only specific slot LicenseDocument has, and the most optional in this context)?this PR has gotten stale - my apologies I didn't follow through with it.
It reminds me that there is still an ongoing todo to represent datalad-catalog metadata in our schema to be able to "translate" datasets from the catalog (e.g., from ABCD-J) to a knowledge pool. This PR clearly didn't accomplish that - I'll make sure to cross-link the original issue in this repo again.
Pull request closed