BibaAndBoba package#

BibaAndBoba#

class BibaAndBoba.biba_and_boba.BibaAndBoba(file_1: Union[str, bytes, BufferedReader, BinaryIO, BytesIO, TextIO], file_2: Union[str, bytes, BufferedReader, BinaryIO, BytesIO, TextIO], subtraction_threshold: int = 3, use_cache: bool = True, flush_cache: bool = False)#

Bases: object

BibaAndBoba is a class that for analyzing two Telegram chat history files. It provides a methods to get the difference words, the frequency distribution of the difference words, and other parameters. Uses NLTK library to tokenize the messages. BibaAndBoba.Reader class is used to read the files.

__init__(file_1: Union[str, bytes, BufferedReader, BinaryIO, BytesIO, TextIO], file_2: Union[str, bytes, BufferedReader, BinaryIO, BytesIO, TextIO], subtraction_threshold: int = 3, use_cache: bool = True, flush_cache: bool = False)#

The __init__ function is called when an instance of the class is created. It initializes all the variables that are unique to each instance.

Parameters

self – Reference the object itself
file_1 – Specify the first file
file_2 – Specify the second file
subtraction_threshold (int (optional)) – The threshold for the subtraction function, defaults to 3. It’s not recommended to use a value bigger than 3 unless you need to.
use_cache (bool (optional)) – Whether to use the cache or not, defaults to True
flush_cache (bool (optional)) – Whether to flush the cache or not, defaults to False

Raises

ValueError: If files is identical

get_difference_words() → list[str]#

Returns a list of words that are in the first text but not in the second.

Parameters: self – Access the attributes and methods of the class
Returns: A list of words that are unique to the first person document

get_name() → str#

Returns the name of the object.

Parameters: self – Refer to the object itself
Returns: The name of the object

get_tokenized_words_person_1() → list[str]#

Returns a list of all words in the message sent by person 1.

Parameters: self – Refer to the object of the class
Returns: A list of all the words in the person 1 messages

get_tokenized_words_person_2() → list[str]#

Returns a list of all words in the message sent by person 2.

Parameters: self – Access the class attributes and methods
Returns: A list of all the words in the person 2 messages

parasite_words(limit: int = 10) → DataFrame#

Takes a list of words, counts the frequency of each word, and returns a pd.DataFrame with the most frequent ones.

Parameters: limit (int (optional)) – The number of words to return, defaults to 10
Returns: A dataframe with the most common words and their counts.

Comparator#

class BibaAndBoba.comparator.Comparator(person1: BibaAndBoba, person2: BibaAndBoba, limit: int = 10)#

Bases: object

Comparator class is used to compare two people. It provides methods to get the correlation percentage of the two people and the words that are the same for both of them.

__init__(person1: BibaAndBoba, person2: BibaAndBoba, limit: int = 10)#

get_correlation() → float#

The get_correlation function returns the correlation between two people.

Parameters: self – Access the class attributes
Returns: The correlation between the two columns

get_same_words() → set#

The get_same_words function returns a list of words that are the same for both people.

Parameters: self – Access the attributes and methods of the class in which it is used
Returns: A list of words that are the same as the word in question