compshs.utils package
Subpackages
Submodules
compshs.utils.check module
Created in 2025 @author: Simon Delarue <simon.delarue@telecom-paris.fr>
- compshs.utils.check.check_exist_column_name(connection: Connection, table_name: str, column_name: str) bool[source]
Check whether a column exist in a table.
- compshs.utils.check.check_exist_table_name(connection: Connection, table_name: str) bool[source]
Check whether a table exist in a database.
- compshs.utils.check.check_sql_identifier(identifier: str) str[source]
Ensure that an SQL identifier (table or column name) is safe to use in queries.
- Parameters
identifier (str) – Identifier name to check (column or table name).
- Returns
Identifier if valid.
- Return type
str
- compshs.utils.check.check_sql_identifiers(identifiers: Tuple[str, ...]) Tuple[str, ...][source]
Ensure that a list of SQL identifiers (table or column names) is safe to use in queries.
- Parameters
identifiers (List) – List of identifier names to check (column or table names).
- Returns
List of identifiers if valid.
- Return type
list
compshs.utils.metrics module
Created in 2025 @author: Simon Delarue <simon.delarue@telecom-paris.fr>
- compshs.utils.metrics.average_pairwise_similarity(values_source, values_target) float[source]
Average pairwise similarity between two arrays of values.
Given two arrays of values \(I,J\), average pairwise similarity, denoted with \(psim(I,J)\) is computed as:
\[psim(I,J)=\dfrac{\sum_{i\in I}\sum_{j \in J}sim(i,j)}{|I||J|}\]- Parameters
value_source – Array of values.
value_target – Array of values.
- Returns
Average pairwise similarity.
- Return type
float
- compshs.utils.metrics.coherence(corpus: list, word_sets: dict) float[source]
Coherence over topics defined by word_sets.
- Parameters
corpus (list) – Corpus of documents.
word_sets (dict) – Topic index as keys, word sets as values.
- Returns
Overall topic coherence.
- Return type
float
compshs.utils.rank module
Created in 2025 @author: Simon Delarue <simon.delarue@telecom-paris.fr>
- compshs.utils.rank.extract_top_words(viz_data, n_topics: int, lambdas: array, k: int) dict[source]
- Extract top words for each topics in viz_data.
Use relevance metric to select top_words.
- Parameters
viz_data – Output from
pyLDAvislibrary.n_topics (int) – Number of topics.
lambdas (np.array) – Array of lamba values for relevance formula.
k (int) – Top-k words are selected.
- Returns
Dictionary with topic number as key and top words as values.
- Return type
dict
Module contents
utils module