This adds a workflow which runs the publication enrichment via doi.org.
Given that the DOI org information will change very rarely, and we don't
(yet) have ways to say "this record is complete / needs no enrichment",
the workflow currently only has a "workflow dispatch" trigger.
Two optional inputs can be specified when dispatching the workflow: list
of PIDs and inbox label. These will limit processing to a subset of
records. Otherwise, all records will be processed.
Properties which can change based on the pool / data model (API URL,
collection name, class names) are kept as env variables to make tweaks
easier.
In the last step (process record), inputs are assigned (export) to
environment variables to avoid issues when the runner is filling them in
(eg. end of line after `<<<` when pids are not provided was a syntax
error). To supply the optional `--incoming label` argument to dtc
get-records, parameter expansion is used (`${parameter:+word}` expands
to nothing if parameter is null or unset, otherwise expansion of word is
used).
64 lines
2.3 KiB
YAML
64 lines
2.3 KiB
YAML
name: Enrich publications via doi.org
|
|
|
|
on:
|
|
workflow_dispatch:
|
|
inputs:
|
|
pids:
|
|
description: "Limit to these PIDs (comma-separated)"
|
|
required: false
|
|
default: ''
|
|
type: string
|
|
inbox:
|
|
description: "Limit to inbox with this label"
|
|
required: false
|
|
default: ''
|
|
type: string
|
|
|
|
env:
|
|
DTC_TOKEN: ${{ secrets.POOLTOKEN }}
|
|
DUMPTHINGS_APIURL: https://pool.psychoinformatics.de/api
|
|
DUMPTHINGS_COLLECTION: public
|
|
PERSON_CLASS: XYZPerson
|
|
PUBLICATION_CLASS: XYZPublication
|
|
RULE_CLASS: Rule
|
|
|
|
jobs:
|
|
enrich-publications:
|
|
name: Enrich publications
|
|
runs-on: debian-latest
|
|
defaults:
|
|
run:
|
|
shell: bash
|
|
steps:
|
|
- name: Install uv
|
|
uses: astral-sh/setup-uv@v6
|
|
- name: Install metadata tools
|
|
run: |
|
|
uv tool install https://hub.psychoinformatics.de/orinoco/query-things.git \
|
|
--with-executables-from dump-things-pyclient
|
|
- name: Fetch script
|
|
run: |
|
|
wget https://hub.psychoinformatics.de/orinoco/knowledge-enrichment/raw/branch/main/.forgejo/tools/enrich-via-doi.py
|
|
- name: Pre-fetch data
|
|
run: |
|
|
mkdir .cache
|
|
dtc get-records $DUMPTHINGS_APIURL public -C $PERSON_CLASS > .cache/Person.jsonl
|
|
dtc get-records $DUMPTHINGS_APIURL public -C $RULE_CLASS > .cache/Rule.jsonl
|
|
- name: Process records
|
|
run: |
|
|
export INBOX_LABEL=${{ inputs.inbox }}
|
|
export PIDS=${{ inputs.pids }}
|
|
if [ -n "$PIDS" ]
|
|
then
|
|
IFS=',' read -ra PID_ARRAY <<< $PIDS
|
|
for pid in ${PID_ARRAY[@]}
|
|
do
|
|
dtc get-records $DUMPTHINGS_APIURL $DUMPTHINGS_COLLECTION --pid $pid ${INBOX_LABEL:+--incoming $INBOX_LABEL} |
|
|
uv run enrich-via-doi.py --persons .cache/Person.jsonl --rules .cache/Rule.jsonl - - |
|
|
dtc post-records $DUMPTHINGS_APIURL $DUMPTHINGS_COLLECTION $PUBLICATION_CLASS
|
|
done
|
|
else
|
|
dtc get-records $DUMPTHINGS_APIURL $DUMPTHINGS_COLLECTION --class $PUBLICATION_CLASS ${INBOX_LABEL:+--incoming $INBOX_LABEL} |
|
|
uv run enrich-via-doi.py --persons .cache/Person.jsonl --rules .cache/Rule.jsonl - - |
|
|
dtc post-records $DUMPTHINGS_APIURL $DUMPTHINGS_COLLECTION $PUBLICATION_CLASS
|
|
fi
|