Advanced Methods in Data Analysis

Instructor: David Zarruk Valencia

The structure of the course and class notes were provided by Alejandro Correa Bahnsen.

email: [email protected]
github: davidzarruk

The use of statistical models in computer algorithms allows computers to make decisions and predictions, and to perform tasks that traditionally require human cognitive abilities. Machine learning is the interdisciplinary field at the intersection of statistics and computer science which develops such algorithms and interweaves them with computer systems. It underpins many modern technologies, such as speech recognition, internet search, bioinformatics, computer vision, Amazon’s recommender system, Google’s driverless car and the most recent imaging systems for cancer diagnosis are all based on Machine Learning technology.

This course on Time Series Analysis, Machine Learning and Natural Language Processing will explain how to build systems that learn and adapt using real-world applications. Some of the topics to be covered include time series analysis, machine learning, python data analysis, natural language processing models and recurrent models. The course will be project-oriented, with emphasis placed on writing software implementations of learning algorithms applied to real-world problems, in particular, churn modeling, natural language processing, sentiment detection, among others.

Requiriments

Python version 3.7;
Numpy, the core numerical extensions for linear algebra and multidimensional arrays;
Scipy, additional libraries for scientific programming;
Matplotlib, excellent plotting and graphing libraries;
IPython, with the additional libraries required for the notebook interface.
Pandas, Python version of R dataframe
Seaborn, used mainly for plot styling
scikit-learn, Machine learning library!
prophet, Time series forecasting library by Facebook.

A good, easy to install option that supports Mac, Windows, and Linux, and that has all of these packages (and many more) is the Anaconda.

GIT!! Unfortunatelly out of the scope of this class, but please take a look at these tutorials

Evaluation

75% Projects (3 projects, 25% each)
25% Exercises

Course groups

There are three projects during the course. Each project must be developed in groups of 5 students.

Previous survey

For the purpose of the last project, in which we will construct an NLP algorithm to classify text, it would be great to have some information on the students' background. Please, fill this survey.

People

Graduate assistant: Vanessa Sierra

Monitor: Felipe Rueda

Instructor: David Zarruk

Office hours:

Vanessa: Mondays 6-7pm David: Wednesdays 6-7pm

Schedule

Time Series Analysis

Date	Session	Notebooks/Presentations	Exercises
July 6th	ARIMA Processes	1 - Intro to TSA 2 - ARIMA processes	E1 - TSA Applications E2 - Python TSA Analysis E3 - ARIMA
July 8th	Working with TSA	3 - Prophet	E4 - Panel Data E5 - Prophet P1 - TSA

Machine Learning Systems

Date	Session	Notebooks/Presentations	Exercises
July 13th	Decision Trees & Ensembles	4 - Decision Trees 5 - Bagging	E6 - DT E7 - DT & Ensembles
July 15th	Random Forest and XGBoost	6 - Random Forests 7 - XGBoost	E8 - Ensembles E9 - Random Forest & Gradient Boosting
July 22nd	Machine Learning as a Service	8 Introduction to Rest APIs 9 - Model Deployment 10 - APIs in AWS	E10 - API Review P2 - Trees and APIs

Natural Language Processing

Date

Session

Notebooks/Presentations

Exercises

July 23rd

Natural Language Processing

11 - Introduction to NLP

12 - Introduction to NLP II

July 27th

Sentiment Analysis

13 - Sentiment Analysis

July 29th

NLP using Neural Networks

Name		Name	Last commit message	Last commit date
Latest commit History 61 Commits
.gitignore		.gitignore
Exercises		Exercises
datasets		datasets
notebooks		notebooks
Grupos.xlsx		Grupos.xlsx
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Advanced Methods in Data Analysis

Requiriments

Evaluation

Course groups

Previous survey

People

Office hours:

Schedule

Time Series Analysis

Machine Learning Systems

Natural Language Processing

About

Uh oh!

Releases

Packages

Languages

License

davidzarruk/AdvancedMethodsDataAnalysisClass

Folders and files

Latest commit

History

Repository files navigation

Advanced Methods in Data Analysis

Requiriments

Evaluation

Course groups

Previous survey

People

Office hours:

Schedule

Time Series Analysis

Machine Learning Systems

Natural Language Processing

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages