Skip to content

oroszgy/hungarian-text-mining-workshop

Repository files navigation

Text mining workshop

Preparation for the workshop

Please be prepared with

  • basic knowledge of Python
  • experience in using Jupyter notebooks

During the course we will use little bit of Pandas (10 minute intro) and scikit-learn to build simple machine learning models.

Install dependencies and run the notebooks

The easy way: using Docker

Get the docker image: docker pull oroszgy/hungarian-text-mining-workshop

Start Jupyter Notebook: make start

The hard way: installing the packages manually

  1. Make sure you have Python 3.5+ installed (preferably a conda distribution)
  2. Clone this repository: git clone http://github.com/oroszgy/hungarian-text-mining-workshop && cd hungarian-text-mining-workshop
  3. Install the necessary packages: pip install -r requirements.txt
  4. Download the Enlgish and the Hungaruan NLP models for spaCy:
    • python -m spacy download en
    • pip install https://github.com/oroszgy/spacy-hungarian-models/releases/download/hu_tagger_web_md-0.1.0/hu_tagger_web_md-0.1.0.tar.gz
  5. Install HuNlpy
    • pip install https://github.com/oroszgy/hunlp/releases/download/0.2/hunlp-0.2.0.tar.gz

Start Jupyter Notebook: jupyter notebook

Table of Contents

  1. Practical NLP in Python: spaCy and textacy, Describing documents with words
  2. Document categorization, Sentiment analysis
  3. Extracting named entities and concepts

Softwares used


(c) Gyorgy Orosz, 2017

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •  

Languages