datalad-tabby/datalad_tabby/io/tests
Michael Hanke 2c767b66eb Support sheet convention declaration
This extends the implementation by recognizing an optional sheet name
suffix (`@` delimiter) to indicate that the sheet adopts a particular
convention. For example, instead of `dataset.tsv`, a file name can now
be `dataset@prj462.tsv`, where `prj462` is an arbitrary label for a
particular convention.

The convention label is a regular part of the sheet name, hence its
syntax rules apply.

Individual sheets in a record need not use a single convention, but
may mix-and-match from any available convention.

If a convention label is declared for a sheet, the convention
specification will serve as a fallback (default-provider) for any sheet
component (context, overrides, JSON data, or even a sheet itself).  This
means that if a record does not contain any such sheet component the
convention specification is checked for a matching component, which is
then used, if it exists.

This feature enables rather minimal tabby records. They can adopt
semantics and override logic from a (set of) convention(s), by simply
declaring one in the sheet name(s).

A demo is provided in an included test. A new convention `tby-sd1` is
added to the sources, hence provided as a first "standard" convention
that is shipped with the package.

`tby-` is an arbitrary prefix that is adopted for this an all future
conventions that are included in the package (to minimize name conflicts
with user-provided conventions).

`sd` is short for "scientific dataset". It is radically abbreviated to
keep the convention label short (good for display when sheets are edited
manually).

`1` is a version label for the convention. It needs to be incremented
when backward incompatible changes to the convention are made.

In the source tree conventions are stored at
`datalad_tabby/io/conventions/<label>/`. Underneath this directory the
convention components are stored like tabby records in "singledir"
layout.

This storage convention (where the version is included in the directory
name) ensure that there are no naming conflicts when versions are
incremented, and previous specifications can remain available
indefinitely.

User can provide additional lookup paths for conventions (see
`load_tabby(..., convention_paths=[])`. User provided locations are
always considered first. If a conventions is defined in multiple
locations, the first matching location is used.

At this point the specification of additional convention locations
is not integrated into the DataLad `tabby_load()` command. A future
update will integrated with the DataLad configuration mechanism.

Demo:

`dataset@tby-sd1.tsv`

```
name	myds
```

`authors@tby-sd1.tsv`

```
name	email	orcid	affiliation
Josiah Carberry	jc@example.com	0000-0002-1825-0097 Brown University
```

`funding@tby-sd1.tsv`

```
funder	grant_id    title
DFG SFB000-INF	Short but ambitious project
```

yields the following JSON-LD record on load:

```json

{
  "author": [
    {
      "name": "Josiah Carberry",
      "email": "jc@example.com",
      "orcid": "0000-0002-1825-0097",
      "affiliation": "Brown University",
      "@id": "https://orcid.org/0000-0002-1825-0097",
      "@type": "schema:Person",
      "@context": {
        "bibo": "https://purl.org/ontology/bibo/",
        "obo": "https://purl.obolibrary.org/",
        "schema": "https://schema.org",
        "affiliation": "schema:affiliation",
        "email": "schema:email",
        "name": "schema:name",
        "orcid": "obo:IAO_0000708"
      }
    }
  ],
  "funding": [
    {
      "funder": "DFG",
      "grant_id": "SFB000-INF",
      "title": "Short but ambitious project",
      "@type": "schema:Grant",
      "@context": {
        "schema": "https://schema.org",
        "funder": "schema:funder",
        "grant_id": "schema:identifier",
        "title": "schema:title"
      }
    }
  ],
  "name": "myds",
  "@context": {
    "bibo": "https://purl.org/ontology/bibo/",
    "obo": "https://purl.obolibrary.org/",
    "schema": "https://schema.org",
    "author": "schema:author",
    "citation": "schema:citation",
    "description": "schema:description",
    "doi": "bibo:doi",
    "funding": "schema:funding",
    "homepage": "schema:mainEntityOfPage",
    "identifier": "schema:identifier",
    "keyword": "schema:keywords",
    "last-updated": "schema:dateModified",
    "license": "schema:license",
    "name": "schema:name",
    "title": "schema:title",
    "version": "schema:version"
  }
}
```

which compacts to

```json
{
  "schema:author": {
    "@id": "https://orcid.org/0000-0002-1825-0097",
    "@type": "schema:Person",
    "https://purl.obolibrary.org/IAO_0000708": "0000-0002-1825-0097",
    "schema:affiliation": "Brown University",
    "schema:email": "jc@example.com",
    "schema:name": "Josiah Carberry"
  },
  "schema:funding": {
    "@type": "schema:Grant",
    "schema:funder": "DFG",
    "schema:identifier": "SFB000-INF",
    "schema:title": "Short but ambitious project"
  },
  "schema:name": "myds"
}
```

Closes #86
2023-07-26 10:03:12 +02:00
..
__init__.py Move fixtures to a dedicated place 2023-07-17 10:05:06 +02:00
test_conventions.py Support sheet convention declaration 2023-07-26 10:03:12 +02:00
test_imports.py Tests for optional sheet linkage 2023-07-25 16:08:33 +02:00
test_jsondata.py Support optional sheet imports/linkage 2023-07-24 08:49:08 +02:00
test_load.py Support 'single-record-per-directory' format 2023-07-19 21:46:52 +02:00
test_overrides.py Support single format "sheets" as JSON data 2023-07-22 09:48:53 +02:00
test_tsv2xlsx.py Restructure io module 2023-07-19 10:53:34 +02:00