Utils

Metrics

compshs.utils.diversity(top_words: dict) → float[source]

Diversity over topics.

Parameters: top_words (dict) – Topic index as keys, top-word sets as values.
Returns: Diversity.
Return type: float

compshs.utils.coherence(corpus: list, word_sets: dict) → float[source]

Coherence over topics defined by word_sets.

Parameters

Returns

Overall topic coherence.

Return type

float

compshs.utils.average_pairwise_similarity(values_source, values_target) → float[source]

Average pairwise similarity between two arrays of values.

Given two arrays of values \(I,J\), average pairwise similarity, denoted with \(psim(I,J)\) is computed as:

\[psim(I,J)=\dfrac{\sum_{i\in I}\sum_{j \in J}sim(i,j)}{|I||J|}\]

Parameters

Returns

Average pairwise similarity.

Return type

float

compshs.utils.top_k(values: ndarray, k: int = 1) → ndarray[source]

Returns indices of the k highest values.

Parameters

Returns

Array of k indices.

Return type

np.ndarray

compshs.utils.extract_top_words(viz_data, n_topics: int, lambdas: array, k: int) → dict[source]

Extract top words for each topics in viz_data.: Use relevance metric to select top_words.

Parameters

Returns

Dictionary with topic number as key and top words as values.

Return type

dict