Skip to content

davidzarruk/AdvancedMethodsDataAnalysisClass

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

61 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Advanced Methods in Data Analysis

Instructor: David Zarruk Valencia

The structure of the course and class notes were provided by Alejandro Correa Bahnsen.

The use of statistical models in computer algorithms allows computers to make decisions and predictions, and to perform tasks that traditionally require human cognitive abilities. Machine learning is the interdisciplinary field at the intersection of statistics and computer science which develops such algorithms and interweaves them with computer systems. It underpins many modern technologies, such as speech recognition, internet search, bioinformatics, computer vision, Amazon’s recommender system, Google’s driverless car and the most recent imaging systems for cancer diagnosis are all based on Machine Learning technology.

This course on Time Series Analysis, Machine Learning and Natural Language Processing will explain how to build systems that learn and adapt using real-world applications. Some of the topics to be covered include time series analysis, machine learning, python data analysis, natural language processing models and recurrent models. The course will be project-oriented, with emphasis placed on writing software implementations of learning algorithms applied to real-world problems, in particular, churn modeling, natural language processing, sentiment detection, among others.

Requiriments

  • Python version 3.7;
  • Numpy, the core numerical extensions for linear algebra and multidimensional arrays;
  • Scipy, additional libraries for scientific programming;
  • Matplotlib, excellent plotting and graphing libraries;
  • IPython, with the additional libraries required for the notebook interface.
  • Pandas, Python version of R dataframe
  • Seaborn, used mainly for plot styling
  • scikit-learn, Machine learning library!
  • prophet, Time series forecasting library by Facebook.

A good, easy to install option that supports Mac, Windows, and Linux, and that has all of these packages (and many more) is the Anaconda.

GIT!! Unfortunatelly out of the scope of this class, but please take a look at these tutorials

Evaluation

  • 75% Projects (3 projects, 25% each)
  • 25% Exercises

Course groups

There are three projects during the course. Each project must be developed in groups of 5 students.

Previous survey

For the purpose of the last project, in which we will construct an NLP algorithm to classify text, it would be great to have some information on the students' background. Please, fill this survey.

People

Graduate assistant: Vanessa Sierra

Monitor: Felipe Rueda

Instructor: David Zarruk

Office hours:

Vanessa: Mondays 6-7pm David: Wednesdays 6-7pm

Schedule

Time Series Analysis

Date Session Notebooks/Presentations Exercises
July 6th ARIMA Processes
July 8th Working with TSA

Machine Learning Systems

Date Session Notebooks/Presentations Exercises
July 13th Decision Trees & Ensembles
July 15th Random Forest and XGBoost
July 22nd Machine Learning as a Service

Natural Language Processing

Date Session Notebooks/Presentations Exercises
July 23rd Natural Language Processing
July 27th Sentiment Analysis
July 29th NLP using Neural Networks

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published