Python Machine Learning Package on collection of Docker Containers

Goal

This project's goal is to use docker containers to set up a network of services and workbenches commonly used by Data Scientists working on Machine Learning problems. It's currently marked as experimental and contributions are welcome. The Docker Compose file outlines a couple of the containers. They should be configured to work with eachother over the docpyml network you create on your docker VM.

List of Containers:

docpyml-namenode: Hadoop NameNode. Keeps the directory tree of all files in the file system.
docpyml-datanode1: Data Storage HadoopFileSystem
docpyml-datanode2: Data Storage HadoopFileSystem
docpyml-spark-master: Apache Spark Master
spark-worker (<- may launch many): Spark Workers. Also contain the Python version matching docpyml-conda
docpyml-sparknotebook: Preconfigured Spark Notebook
docpyml-hdfsfb: HDFS FileBrowser from Cloudera Hue
docpyml-conda: Anaconda Python 3.5 with Jupyter Notebook, machine learning packages, pySpark preconfigured
docpyml-rocker: RStudio

Install

Prerequisites. Docker Toolbox.

optionally adjust your VM settings:

    docker-machine stop
    VBoxManage modifyvm default --cpus 4
    VBoxManage modifyvm default --memory 8192
    docker-machine start

To start the enviroment:

    docker network create docpyml
    docker-compose up -d

If says docker not running try first:

   eval "$(docker-machine env default)"

To scale up spark-workers:

    docker-compose scale spark-worker=3

Name		Name	Last commit message	Last commit date
Latest commit History 23 Commits
conda		conda
docker-hdfs-filebrowser		docker-hdfs-filebrowser
spark-worker		spark-worker
yarn-remote-client		yarn-remote-client
.gitattributes		.gitattributes
.gitignore		.gitignore
README.md		README.md
docker-compose.yml		docker-compose.yml
hadoop.env		hadoop.env
spawn-spark-worker.sh		spawn-spark-worker.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Python Machine Learning Package on collection of Docker Containers

Goal

Install

Credits

About

Uh oh!

Releases

Packages

Languages

brianray/docpyml

Folders and files

Latest commit

History

Repository files navigation

Python Machine Learning Package on collection of Docker Containers

Goal

Install

Credits

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages