Instructor: David Zarruk Valencia
The structure of the course and class notes were provided by Alejandro Correa Bahnsen.
- email: [email protected]
- github: davidzarruk
The use of statistical models in computer algorithms allows computers to make decisions and predictions, and to perform tasks that traditionally require human cognitive abilities. Machine learning is the interdisciplinary field at the intersection of statistics and computer science which develops such algorithms and interweaves them with computer systems. It underpins many modern technologies, such as speech recognition, internet search, bioinformatics, computer vision, Amazon’s recommender system, Google’s driverless car and the most recent imaging systems for cancer diagnosis are all based on Machine Learning technology.
This course on Time Series Analysis, Machine Learning and Natural Language Processing will explain how to build systems that learn and adapt using real-world applications. Some of the topics to be covered include time series analysis, machine learning, python data analysis, natural language processing models and recurrent models. The course will be project-oriented, with emphasis placed on writing software implementations of learning algorithms applied to real-world problems, in particular, churn modeling, natural language processing, sentiment detection, among others.
- Python version 3.7;
- Numpy, the core numerical extensions for linear algebra and multidimensional arrays;
- Scipy, additional libraries for scientific programming;
- Matplotlib, excellent plotting and graphing libraries;
- IPython, with the additional libraries required for the notebook interface.
- Pandas, Python version of R dataframe
- Seaborn, used mainly for plot styling
- scikit-learn, Machine learning library!
- prophet, Time series forecasting library by Facebook.
A good, easy to install option that supports Mac, Windows, and Linux, and that has all of these packages (and many more) is the Anaconda.
GIT!! Unfortunatelly out of the scope of this class, but please take a look at these tutorials
- 75% Projects (3 projects, 25% each)
- 25% Exercises
There are three projects during the course. Each project must be developed in groups of 5 students.
For the purpose of the last project, in which we will construct an NLP algorithm to classify text, it would be great to have some information on the students' background. Please, fill this survey.
Graduate assistant: Vanessa Sierra
Monitor: Felipe Rueda
Instructor: David Zarruk
Vanessa: Mondays 6-7pm David: Wednesdays 6-7pm
Date | Session | Notebooks/Presentations | Exercises |
---|---|---|---|
July 6th | ARIMA Processes | ||
July 8th | Working with TSA |
Date | Session | Notebooks/Presentations | Exercises |
---|---|---|---|
July 13th | Decision Trees & Ensembles | ||
July 15th | Random Forest and XGBoost | ||
July 22nd | Machine Learning as a Service |
Date | Session | Notebooks/Presentations | Exercises |
---|---|---|---|
July 23rd | Natural Language Processing | ||
July 27th | Sentiment Analysis | ||
July 29th | NLP using Neural Networks |