datalad-tabby/datalad_tabby/io
Michał Szczepanik c118192587 Add an encoding parameter to io.load_tabby
By default, `Path.open()` uses `locale.getencoding()` when opening the
file for reading. This has caused problems when loading files
saved (presumably on Windows) with iso-8859-1 encoding on linux (where
utf-8 is the default), see #112

The default behaviour is maintained with `encoding=None`, and any
valid encoding name can be provided as an argument to load_tabby. The
encoding will be used for loading tsv files.

The encoding is stored as an attribute of `_TabbyLoader` rather than
passed as an input to the load functions - since they may end up being
called in a few places (when sheet import statements are found), it
would be too much passing around otherwise.

With external libraries it might be possible to guess a file encoding
that produces a correct result based on the files content, but the
success is not guaranteed when there are few non-ascii characters in
the entire file (think: list of authors). Here, we do not attempt to
guess, instead expecting the user to know the encoding they need to
use.

Ref:
https://docs.python.org/3/library/pathlib.html#pathlib.Path.open
https://docs.python.org/3/library/functions.html#open
2023-11-21 16:52:58 +01:00
..
conventions Fix broken schema.org IRI prefix definition (missed trailing slash) 2023-07-28 21:17:38 +02:00
tests Support sheet convention declaration 2023-07-26 10:03:12 +02:00
__init__.py Restructure io module 2023-07-19 10:53:34 +02:00
load.py Add an encoding parameter to io.load_tabby 2023-11-21 16:52:58 +01:00
load_utils.py Perform document key sanitization for overrides 2023-07-27 09:51:18 +02:00
xlsx.py Restructure io module 2023-07-19 10:53:34 +02:00