datalad-tabby/datalad_tabby
Michał Szczepanik c118192587 Add an encoding parameter to io.load_tabby
By default, `Path.open()` uses `locale.getencoding()` when opening the
file for reading. This has caused problems when loading files
saved (presumably on Windows) with iso-8859-1 encoding on linux (where
utf-8 is the default), see #112

The default behaviour is maintained with `encoding=None`, and any
valid encoding name can be provided as an argument to load_tabby. The
encoding will be used for loading tsv files.

The encoding is stored as an attribute of `_TabbyLoader` rather than
passed as an input to the load functions - since they may end up being
called in a few places (when sheet import statements are found), it
would be too much passing around otherwise.

With external libraries it might be possible to guess a file encoding
that produces a correct result based on the files content, but the
success is not guaranteed when there are few non-ascii characters in
the entire file (think: list of authors). Here, we do not attempt to
guess, instead expecting the user to know the encoding they need to
use.

Ref:
https://docs.python.org/3/library/pathlib.html#pathlib.Path.open
https://docs.python.org/3/library/functions.html#open
2023-11-21 16:52:58 +01:00
..
io Add an encoding parameter to io.load_tabby 2023-11-21 16:52:58 +01:00
tests Fix broken schema.org IRI prefix definition (missed trailing slash) 2023-07-28 21:17:38 +02:00
__init__.py Simplist initial implementation of a tabby_load() command 2023-07-07 16:28:14 +02:00
_version.py Update extension basics for use as datalad-tabby: 2023-06-16 11:14:42 +02:00
conftest.py Minimalistic tabby metadata extractor with datalad-metalad 2023-07-20 09:36:27 +02:00
extractor.py tabby extractor also reports dscollection records 2023-07-20 14:50:54 +02:00
load.py Support tabby-load --compact @context 2023-07-18 07:41:20 +02:00