moby-dick-project

Q: Create a list of the 100 most frequently occurring words with the count of occurrences for each word found in the attached text for Herman Melville's novel, Moby Dick. Ensure this top-100 list does not include any words in the provided stop words list.

Programming language used is python.

The data used is present in data folder in which mobydick.txt contains novel's text data and stop-words.txt is used to store the stop words.

The code used is written in main.py

Steps to execute:

create a virtualenv so that libraries used in this code doesn't effect libraries in core python
install libraries using pip install -r requirements.txt
Run the code using python main.py
For unit testing test.py needs to be executed using python test.py

Note: The folder output will be created once you run this file. It will contain images of output such as word-cloud and frequency distribution of words.

Progression of code:

Step 1: the analysis of data file was done by checking what is present in stop-words.txt file first and writing the code for preprocessing the file to usable array format.

Step 2: the analysis of data file was done by checking what is present in mobydick.txt file.

Step 3: Creating the code for calculating unique words and their frequency

Step 4: Analysing the unique words. Creating the preprocessing file to preprocess the sentences by removing symbols.

Step 5: analysing the results again.

Step 6: creating the sorting code and printing it in terminal. RepresentatioPutting it in word cloud for better representation

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

moby-dick-project

Steps to execute:

Progression of code:

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
data		data
output		output
.gitignore		.gitignore
README.md		README.md
main.py		main.py
requirements.txt		requirements.txt
test.py		test.py

aar2416/moby-dick-project

Folders and files

Latest commit

History

Repository files navigation

moby-dick-project

Steps to execute:

Progression of code:

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages