Getting Started#
Installation#
pip install --upgrade liwca
For development:
git clone https://github.com/remrama/liwca.git
cd liwca
uv pip install -e ".[dev]"
Fetching dictionaries#
Each registered dictionary has a dedicated fetch_* function that downloads
the file on first use and returns it as a DataFrame:
import liwca
dx = liwca.fetch_threat()
See Fetching Dictionaries for all available dictionaries and their options.
Downloaded files are cached locally via
Pooch. By default, files are
cached in your OS data directory. You can override this by setting the LIWCA_DATA_DIR environment variable
before importing liwca:
export LIWCA_DATA_DIR=/path/to/my/cache
Counting words#
count takes texts and a dictionary DataFrame, and returns a
documents × categories table:
texts = ["The threat of danger loomed over the city", "A calm morning"]
results = liwca.count(texts, dx)
Values are percentages of total words per document by default. See
count for options including raw counts and custom tokenizers.
DDR (semantic scoring)#
ddr scores texts against dictionary categories using cosine
similarity in embedding space, following the Distributed Dictionary
Representation method (Garten et al., 2018). This captures semantic proximity
even when exact dictionary words are absent from the text.
Pass a gensim model name to automatically download embeddings (requires
pip install liwca[ddr]):
results = liwca.ddr(texts, dx, "glove-wiki-gigaword-100")
Or bring your own embeddings as a dict-like mapping:
results = liwca.ddr(texts, dx, my_embeddings)
Values are cosine similarities in [-1, 1]. See ddr for full
parameter details.
Reading and writing local files#
dx = liwca.read_dx("my_dictionary.dicx") # auto-detects .dic or .dicx
liwca.write_dx(dx, "my_dictionary.dic")
merged = liwca.merge_dx(dx_a, dx_b)
LIWC-22 wrapper#
If LIWC-22 is installed, call it from Python. The LIWC-22 desktop application (or its license server) must be running when you call the CLI:
liwca.liwc22("wc", input="data.csv", output="results.csv")
Pass auto_open=True to let liwca start and stop LIWC-22 automatically.
See liwc22 for the full argument reference, and the
LIWC CLI documentation and
Python CLI example
for more details.