Skip to content

Commit c571c24

Browse files
author
xuwenyihust
committed
Some updates.
1 parent 0adc297 commit c571c24

File tree

1 file changed

+19
-1
lines changed

1 file changed

+19
-1
lines changed

README.md

Lines changed: 19 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -38,8 +38,26 @@ python3.4 web/dashboard.py
3838

3939
## Process
4040

41+
### Twitter API
42+
* Use Twitter API **tweepy** to stream tweets
43+
* Filter out the tweets which contain the specific keywords/hashtag that we want to track.
44+
* Use **TCP/IP socket** to send the fetched tweets to the spark job
45+
46+
### Real-time Analysis
47+
* Use **Spark Streaming** to perform the real-time analysis on the tweets
48+
* Count the number of related tweets for each time interval
49+
* Tweet context **preprocess**
50+
* Remove all punctuations
51+
* Set capital letters to lower case
52+
* Remove stop words for better performance
53+
* Find out the most related keywords
54+
* Find out the most related hashtags
55+
56+
### Database
57+
* Use MongoDB to store the analysis results
58+
4159
### Visualization
42-
Time line of related tweet counts, most related hashtags, most related keywords, the ratio of postive/negative tweets.
60+
**Time line** of related tweet counts, **most related hashtags**, **most related keywords**, the ratio of **postive/negative** tweets.
4361
<p align="justify">
4462
<img src="https://github.com/xuwenyihust/Twitter-Hashtag-Tracking/blob/master/img/timeline.JPG" width="200"/>
4563
<img src="https://github.com/xuwenyihust/Twitter-Hashtag-Tracking/blob/master/img/hashtags.JPG" width="200"/>

0 commit comments

Comments
 (0)