BibaAndBoba package#
BibaAndBoba#
- class BibaAndBoba.biba_and_boba.BibaAndBoba(file_1: Union[str, bytes, BufferedReader, BinaryIO, BytesIO, TextIO], file_2: Union[str, bytes, BufferedReader, BinaryIO, BytesIO, TextIO], subtraction_threshold: int = 3, use_cache: bool = True, flush_cache: bool = False)#
Bases:
object
BibaAndBoba is a class that for analyzing two Telegram chat history files. It provides a methods to get the difference words, the frequency distribution of the difference words, and other parameters. Uses NLTK library to tokenize the messages.
BibaAndBoba.Reader
class is used to read the files.- __init__(file_1: Union[str, bytes, BufferedReader, BinaryIO, BytesIO, TextIO], file_2: Union[str, bytes, BufferedReader, BinaryIO, BytesIO, TextIO], subtraction_threshold: int = 3, use_cache: bool = True, flush_cache: bool = False)#
The __init__ function is called when an instance of the class is created. It initializes all the variables that are unique to each instance.
- Parameters
self – Reference the object itself
file_1 – Specify the first file
file_2 – Specify the second file
subtraction_threshold (int (optional)) – The threshold for the subtraction function, defaults to 3. It’s not recommended to use a value bigger than 3 unless you need to.
use_cache (bool (optional)) – Whether to use the cache or not, defaults to True
flush_cache (bool (optional)) – Whether to flush the cache or not, defaults to False
- Raises
ValueError: If files is identical
- get_difference_words() list[str] #
Returns a list of words that are in the first text but not in the second.
- Parameters
self – Access the attributes and methods of the class
- Returns
A list of words that are unique to the first person document
- get_name() str #
Returns the name of the object.
- Parameters
self – Refer to the object itself
- Returns
The name of the object
- get_tokenized_words_person_1() list[str] #
Returns a list of all words in the message sent by person 1.
- Parameters
self – Refer to the object of the class
- Returns
A list of all the words in the person 1 messages
- get_tokenized_words_person_2() list[str] #
Returns a list of all words in the message sent by person 2.
- Parameters
self – Access the class attributes and methods
- Returns
A list of all the words in the person 2 messages
- parasite_words(limit: int = 10) DataFrame #
Takes a list of words, counts the frequency of each word, and returns a
pd.DataFrame
with the most frequent ones.- Parameters
limit (int (optional)) – The number of words to return, defaults to 10
- Returns
A dataframe with the most common words and their counts.
Comparator#
- class BibaAndBoba.comparator.Comparator(person1: BibaAndBoba, person2: BibaAndBoba, limit: int = 10)#
Bases:
object
Comparator class is used to compare two people. It provides methods to get the correlation percentage of the two people and the words that are the same for both of them.
- __init__(person1: BibaAndBoba, person2: BibaAndBoba, limit: int = 10)#
- get_correlation() float #
The get_correlation function returns the correlation between two people.
- Parameters
self – Access the class attributes
- Returns
The correlation between the two columns
- get_same_words() set #
The get_same_words function returns a list of words that are the same for both people.
- Parameters
self – Access the attributes and methods of the class in which it is used
- Returns
A list of words that are the same as the word in question