WIP: Guess encoding if default does not work #114

Closed

mslw wants to merge 5 commits from encoding into main

Author	SHA1	Message	Date
Michał Szczepanik	f0c44c1818	Make encoding a property of TabbyLoader Because load functions are used recursively (when load statements are found in a tabby file), it would be too much hassle to pass the encoding parameter around - better use `self._encoding`.	2023-11-21 13:50:43 +01:00
Michał Szczepanik	070937a7c2	Add an encoding argument to tabby loader When an encoding is explicitly specified, it will be used. Otherwise, default encoding used by Path.open will be tried, and charset_normalizer will be used to guess if that fails.	2023-11-21 12:35:50 +01:00
Michał Szczepanik	8d4b6e1aba	Fix a type annotation	2023-11-21 12:20:21 +01:00
Michał Szczepanik	ef7d778311	Narrow down the try/except This narrows down the try/except to wrap the loader only, and not the extend/append. It is clearer what is being tried.	2023-11-13 19:05:37 +01:00
Michał Szczepanik	71676da64f	Guess encoding if default does not work If reading a tsv file with default encoding fails, roll out a cannon (charset-normalizer) and try to guess encoding to use. By default, `Path.open()` will use `locale.getencoding()` when reading a file (which means that we implicitly use utf-8, at least on linux). This would fail when reading files with non-ascii characters prepared (with not-uncommon settings) on Windows. There is no perfect way to learn the encoding from a plain text file, but existing tools seem to do a good job. This commit refactors tabby loader, makes it use guessed encoding (but only after the default fails) and closes #112 https://charset-normalizer.readthedocs.io	2023-11-13 18:33:52 +01:00