A tensorflow template aimed to ease the workflow of tensorflow projects on GCP.
When students start their first deep-learning course, many face the issue (according to my experience) how they should structure their deep-learning project and how to get it up and running on a GCP instance to train their models. Another issue is also how to manage the code on their GCP instance. This repo aims to ease the workflow by providing a template using Ansible to automatic spin up a GCP instance to deploy a Docker container running tensorflow in jupyter notebook.
- Google Cloud Account
- You can find the information for setting up a Google Cloud Account here.
- gcloud SDK
- Used for managing verifications between Ansible and your cloud-project. For more information, visit gcloud.
- For installation instructions, please visit gcloud/install.
- Ansible
- Used for automating the GCP instance. For more info visit Ansible.
- For installation instructions, please visit ansible/install.
!!NOTE THAT THESE STEPS WILL CREATE A GCP INSTANCE. BE SURE TO CHECK PRICING AT GOOGLE CLOUD BEFORE STARTING AND REMEMBER TO STOP YOUR GCP INSTANCE TO PREVENT UNECESSARY COSTS!!
To create a project, please follow googles official instructions.
You need a service account in order to grant Ansible permissions on your GCP project. To create a service account go to Navigation Menu -> IAM & Admin -> Service Accounts. Click the CREATE SERVICE ACCOUNT button and follow the steps. NOTE: Make sure you select the roles Compute OS Admin Login, Editor and Service Account User. Click continue and then done.
Next, create a key under the Action column and download as a .JSON.
In order to grant permission to access your account, run:
gcloud auth activate-service-account --key-file=/path/to/my/key_file.jsonNext, add your ssh-key to your service account by running:
gcloud compute os-login ssh-keys add --key-file=.ssh/id_rsa.pubYou will get an output similar to this:
loginProfile:
name: '12346789'
posixAccounts:
- accountId: my-project-123
gid: '12346789132456'
homeDirectory: /home/sa_123456789
name: users/[email protected]/projects/my-project-1234
operatingSystemType: LINUX
primary: true
uid: 'some uid'
username: sa_123456789
.
.
.Write down your username (here it is "sa_123456789").
Change the variables in [vars/ansible_vars] to fit your project settings.
- Go to the ansible-scripts folder and run the following command (this will start an instance configured according to the vars in vars/ansible_vars and install necessary packages):
ansible-playbook spin-up-instance.yaml --user <your_username>- Next, run the following script to copy src and the Dockerfile into the instance to build the docker image and start up the docker container that runs jupyter.
ansible-playbook start-instance.yaml --user <your_username>When the playbook is done you can go to the prompted ip and use your token (127.0.0.1:8888/?token="your-token") that is printed to the terminal. This will open up jupyter and you can go to src to test an example model.
To stop the GCP instance, simply run the stop-instance.yaml playbook.
ansible-playbook stop-instance.yaml To terminate/delete the GCP instance, run the cleanup-instance.yaml playbook. Note that this will delete the instance permantly.
ansible-playbook cleanup.yaml Ansible
- Please visit Ansible docs for more information regarding the ansible scripts.
- Also check Cloud Advocates youtube video that have helped a lot during the creation of this repo.
GCP
- Googles tutorial for tensorflow on gcp is a good source for information if you want to expand this repo.
Tensorflow
- Check out Datacamps tutorial on tensorflow to get started with the API.
- Also check tensorflows official documentation and the beginners guide.