Contributing#
Development setup#
git clone https://github.com/remrama/liwca.git
cd liwca
uv pip install -e ".[dev]"
Running tests, linting, and type checking:
uv run pytest tests/ --cov=liwca --cov-report=term-missing
uv run ruff check src/ tests/
uv run ruff format src/ tests/
uv run mypy
Code style#
Formatter/linter: Ruff — line length 100, target Python 3.14, rules
E, F, I, W, C90, NPY201Docstrings: NumPy convention
Formatting: double quotes, LF line endings
Adding a new dictionary#
To add a new publicly available dictionary to the registry:
Add a fetch function. Edit
src/liwca/fetchers.pyand add afetch_<name>()function following the existing pattern. Include a NumPy docstring with a short summary,Returns,Notes(with footnote references to the paper and source),References(numbered citations with doi links), andExamples.For standard
.dicor.dicxfiles:def fetch_mydict() -> pd.DataFrame: """ Fetch the my dictionary dictionary. Returns ------- :class:`pandas.DataFrame` Dictionary with ``"category_a"`` and ``"category_b"`` categories. Notes ----- The my dictionary is described in Author et al.\\ [1]_ and publicly available on Example Repository\\ [2]_. References ---------- .. [1] Author et al., Year. Title of the paper. *Journal Name* doi:`10.xxxx/example <https://doi.org/10.xxxx/example>`__ .. [2] `https://example.com/download <https://example.com/download>`__ Examples -------- >>> import liwca >>> dx = liwca.fetch_mydict() # doctest: +SKIP """ return read_dx(_pup.fetch("mydict.dic"))
For non-standard formats (TSV, Excel, plain text, etc.), add custom parsing inline in the function body and call
dx_schema.validate(df)before returning.Add to the registry. Edit
src/liwca/data/registry.txtand append a line with the filename, MD5 hash, and download URL:mydict.dic md5:<hash> https://example.com/download/mydict.dic
Compute the MD5 hash of the file (
md5sum filename.exton Linux/macOS, orcertutil -hashfile filename.ext MD5on Windows).Export from the package. Add
fetch_<name>to__all__insrc/liwca/fetchers.pyand verify it is importable asliwca.fetch_<name>().
Sphinx picks up the new function automatically via autosummary.