liwca.count#
- liwca.count(texts: Iterable[str] | Series, dx: DataFrame, *, tokenizer: Callable[[str], list[str]] | None = None, as_percentage: bool = True, precision: int | None = None, return_words: Literal[False] = False) DataFrame[source]#
- liwca.count(texts: Iterable[str] | Series, dx: DataFrame, *, tokenizer: Callable[[str], list[str]] | None = None, as_percentage: bool = True, precision: int | None = None, return_words: Literal[True]) tuple[DataFrame, DataFrame]
Count LIWC dictionary categories across documents (pure-Python).
- Parameters:
texts (
IterableofstrorSeries) – The documents to analyse. Each element is a single document string.dx (
DataFrame) – A LIWC dictionary DataFrame as returned byliwca.read_dx. Index contains dictionary terms (may include*wildcards), columns are category names, values are binary (0/1).tokenizer (
Callable, optional) – A functionstr -> list[str]used to split each document into lowercase tokens. Defaults to a regex tokenizer that preserves contractions (don't→["don't"]).as_percentage (
bool, optional) – IfTrue(default), return category values as a percentage of total word count per document (matching LIWC’s default output). IfFalse, return raw category counts.precision (
int, optional) – If set, round category value columns to this many decimal places. Only applies whenas_percentage=True. The"WC"column is never rounded.return_words (
bool, optional) – IfTrue, return a tuple(categories, words)where words is a documents × tokens DataFrame holding per-word counts (or percentages) for every dictionary token that appeared in the corpus. Wildcard entries are expanded to the actual corpus tokens that matched (e.g.,recall*→recalled,recalling, …). The sameas_percentageandprecisionsettings apply to both DataFrames. DefaultFalse.
- Returns:
When
return_words=False(default): a documents × categories DataFrame. Index matches the input order (or theSeriesindex if a Series was passed). Columns are the dictionary category names. An additional"WC"column contains the total word count for each document.When
return_words=True: a tuple(categories, words)where categories is the DataFrame described above and words is a documents × tokens DataFrame with one column per matched dictionary token plus a"WC"column.- Return type:
Examples
>>> import liwca >>> dx = liwca.fetch_threat() >>> texts = [ ... "This is a grave threat to our safety.", ... "All is calm today.", ... ] >>> liwca.count(texts, dx) Category WC threat 0 8 12.5 1 4 0.0
Get per-word contributions:
>>> cats, words = liwca.count(texts, dx, return_words=True) >>> words.columns.tolist() ['WC', 'grave', 'threat']