Fix checksum encoding issue. Proposal: do not use explicit hexBinary-type declaration in submitted records #351

Open
opened 2026-06-03 08:29:41 +00:00 by cmo · 0 comments
Owner

Currently the RDFlib-LinkML code in dumpthings renders the notation in dlidentifiers:Checksum with explicit type declaration as python encoded string representation of the represented binary content, i.e., "41"^^xsd:hexBinary becomes "b'A'". This happens only if the explicit type declaration, ^^xsd:hexBinary is provided.

This underlying issue seems to be that range of notation is HexBinary, which defines its uri to be xsd:hexBinary. AFAICS the rdf-linkml code does get confused with an xsd:hexBinary URI, but an instance of a generic HexBinary-class. It uses the URI-registered reader to read the content, yielding a byte-string, but the generic HexBinary-instance methods to emit the content of the byte-string ---which is str--- resulting in something like b'A'.

This can be fixed by:

  1. Not using HexBinary in the notation-range, but using xsd:hexBinary instead.
  2. Implementing support for HexBinary-instances in the python code generators
  3. Not providing the explicit ^^xsd:hexBinary-type delcaration in the TTL record.

Proposed solution: 3: To quickly fix the checksum encoding issue, the explicit hexBinary-type declaration should be removed from the checksum in the submitted TTL-record (this would drop the format check because checksum would be treated as string, but it can be done without re-releasing schemas or adding schema specific code to linkml libraries).

Currently the RDFlib-LinkML code in dumpthings renders the `notation` in `dlidentifiers:Checksum` with explicit type declaration as python encoded string representation of the represented binary content, i.e., `"41"^^xsd:hexBinary` becomes `"b'A'"`. This happens only if the explicit type declaration, `^^xsd:hexBinary` is provided. This underlying issue seems to be that range of `notation` is `HexBinary`, which defines its uri to be `xsd:hexBinary`. AFAICS the rdf-linkml code does get confused with an `xsd:hexBinary` URI, but an instance of a generic `HexBinary`-class. It uses the URI-registered reader to read the content, yielding a byte-string, but the generic `HexBinary`-instance methods to emit the content of the byte-string ---which is `str`--- resulting in something like `b'A'`. This can be fixed by: 1. Not using `HexBinary` in the `notation`-range, but using `xsd:hexBinary` instead. 2. Implementing support for `HexBinary`-instances in the python code generators 3. Not providing the explicit `^^xsd:hexBinary`-type delcaration in the TTL record. **Proposed solution: 3**: To quickly fix the checksum encoding issue, the explicit hexBinary-type declaration should be removed from the checksum in the submitted TTL-record (this would drop the format check because checksum would be treated as string, but it can be done without re-releasing schemas or adding schema specific code to linkml libraries).
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
orinoco/shacl-vue#351
No description provided.