BibaAndBoba package#

BibaAndBoba#

class BibaAndBoba.biba_and_boba.BibaAndBoba(file_1: Union[str, bytes, BufferedReader, BinaryIO, BytesIO, TextIO], file_2: Union[str, bytes, BufferedReader, BinaryIO, BytesIO, TextIO], subtraction_threshold: int = 3, use_cache: bool = True, flush_cache: bool = False)#

Bases: object

BibaAndBoba is a class that for analyzing two Telegram chat history files. It provides a methods to get the difference words, the frequency distribution of the difference words, and other parameters. Uses NLTK library to tokenize the messages. BibaAndBoba.Reader class is used to read the files.

__init__(file_1: Union[str, bytes, BufferedReader, BinaryIO, BytesIO, TextIO], file_2: Union[str, bytes, BufferedReader, BinaryIO, BytesIO, TextIO], subtraction_threshold: int = 3, use_cache: bool = True, flush_cache: bool = False)#

The __init__ function is called when an instance of the class is created. It initializes all the variables that are unique to each instance.

Parameters
  • self – Reference the object itself

  • file_1 – Specify the first file

  • file_2 – Specify the second file

  • subtraction_threshold (int (optional)) – The threshold for the subtraction function, defaults to 3. It’s not recommended to use a value bigger than 3 unless you need to.

  • use_cache (bool (optional)) – Whether to use the cache or not, defaults to True

  • flush_cache (bool (optional)) – Whether to flush the cache or not, defaults to False

Raises

ValueError: If files is identical

get_difference_words() list[str]#

Returns a list of words that are in the first text but not in the second.

Parameters

self – Access the attributes and methods of the class

Returns

A list of words that are unique to the first person document

get_name() str#

Returns the name of the object.

Parameters

self – Refer to the object itself

Returns

The name of the object

get_tokenized_words_person_1() list[str]#

Returns a list of all words in the message sent by person 1.

Parameters

self – Refer to the object of the class

Returns

A list of all the words in the person 1 messages

get_tokenized_words_person_2() list[str]#

Returns a list of all words in the message sent by person 2.

Parameters

self – Access the class attributes and methods

Returns

A list of all the words in the person 2 messages

parasite_words(limit: int = 10) DataFrame#

Takes a list of words, counts the frequency of each word, and returns a pd.DataFrame with the most frequent ones.

Parameters

limit (int (optional)) – The number of words to return, defaults to 10

Returns

A dataframe with the most common words and their counts.

Comparator#

class BibaAndBoba.comparator.Comparator(person1: BibaAndBoba, person2: BibaAndBoba, limit: int = 10)#

Bases: object

Comparator class is used to compare two people. It provides methods to get the correlation percentage of the two people and the words that are the same for both of them.

__init__(person1: BibaAndBoba, person2: BibaAndBoba, limit: int = 10)#
get_correlation() float#

The get_correlation function returns the correlation between two people.

Parameters

self – Access the class attributes

Returns

The correlation between the two columns

get_same_words() set#

The get_same_words function returns a list of words that are the same for both people.

Parameters

self – Access the attributes and methods of the class in which it is used

Returns

A list of words that are the same as the word in question