Utils
Metrics
- compshs.utils.diversity(top_words: dict) float[source]
Diversity over topics.
- Parameters
top_words (dict) – Topic index as keys, top-word sets as values.
- Returns
Diversity.
- Return type
float
- compshs.utils.coherence(corpus: list, word_sets: dict) float[source]
Coherence over topics defined by word_sets.
- Parameters
corpus (list) – Corpus of documents.
word_sets (dict) – Topic index as keys, word sets as values.
- Returns
Overall topic coherence.
- Return type
float
- compshs.utils.average_pairwise_similarity(values_source, values_target) float[source]
Average pairwise similarity between two arrays of values.
Given two arrays of values \(I,J\), average pairwise similarity, denoted with \(psim(I,J)\) is computed as:
\[psim(I,J)=\dfrac{\sum_{i\in I}\sum_{j \in J}sim(i,j)}{|I||J|}\]- Parameters
value_source – Array of values.
value_target – Array of values.
- Returns
Average pairwise similarity.
- Return type
float
Rank
- compshs.utils.top_k(values: ndarray, k: int = 1) ndarray[source]
Returns indices of the k highest values.
- Parameters
values (np.ndarray) – Array of values.
k (int) – Number of elements to return (default = 1).
- Returns
Array of k indices.
- Return type
np.ndarray
- compshs.utils.extract_top_words(viz_data, n_topics: int, lambdas: array, k: int) dict[source]
- Extract top words for each topics in viz_data.
Use relevance metric to select top_words.
- Parameters
viz_data – Output from
pyLDAvislibrary.n_topics (int) – Number of topics.
lambdas (np.array) – Array of lamba values for relevance formula.
k (int) – Top-k words are selected.
- Returns
Dictionary with topic number as key and top words as values.
- Return type
dict