liwca.ddr#
- liwca.ddr(texts, dx, embeddings, *, tokenizer=None, precision=None)[source]#
Score documents against dictionary categories via DDR.
- Parameters:
texts (
IterableofstrorSeries) – The documents to score. Each element is a single document string.dx (
DataFrame) – A LIWC dictionary DataFrame as returned byliwca.read_dx. Index contains dictionary terms (may include*wildcards), columns are category names, values are binary (0/1).embeddings (
strorMapping) – Word embeddings to use. Pass a string to load a pre-trained model viagensim.downloader.load(requirespip install liwca[ddr]), e.g."glove-wiki-gigaword-100". Or pass any mapping that supportsembeddings[word]andword in embeddings— a plaindict, gensimKeyedVectors, etc.tokenizer (
Callable, optional) – A functionstr -> list[str]used to split each document into lowercase tokens. Defaults to a regex tokenizer that preserves contractions (identical toliwca.count’s default).precision (
int, optional) – If set, round cosine similarity values to this many decimal places.
- Returns:
A documents x categories DataFrame. Index matches the input order (or the
Seriesindex if a Series was passed). Columns are the sorted dictionary category names. Values are cosine similarities in [-1, 1], orNaNwhen the document or category vector is undefined (see module docstring for OOV details).- Return type:
Examples
>>> import liwca >>> dx = liwca.fetch_threat() >>> results = liwca.ddr( ... ["danger lurks ahead"], ... dx, ... "glove-wiki-gigaword-100", ... )