Get a GPU machine working on G.C.P

Rémi Connesson
12 min readMar 4, 2019

In this article, you are going to learn how to create step by step a GPU machine on Google Cloud Platform (G.C.P.), if you are eligible this also comes with 300$ worth of Free Credits.

Warning: to access a GPU you need to increase your quota first, this can take anywhere from a couple of minutes to a day or more. If you know you are going to need to use GPU on a certain day, you must do the following steps A.S.A.P.

This article is also available in French, here.

Create a demo-account on G.C.P.

Start by creating a free account on Google Cloud Platform. You have $ 300 of credit available. You can create an account by clicking on * Free Trial *

Create a new project

Click on the three dots at the top-left of the screen.

And select to create a new project.

“New Project”

Here choose a name for your project and validate its creation.

Now open the navigation menu at the top left and click on “Compute Engine”.

If this is the first time you have to wait a few minutes while “Compute Engine” is being initialized.

We know it’s ok to proceed when the waiting message disappears and the “Create” button becomes clickable.

When you can click it is good, but be careful, do not click right now. First, you have to increase your GPU quota.

Increase GPU quota

Before proceeding with the creation of the machine, we need to increase the GPU quota to which we are entitled.
To do this in the search bar at the top, type “quota” and click on “All quotas”

“All Quotas”

To request to increase quotas, you must update your account to activate billing. Don’t worry, you won’t be charged as long as you have Free Credits available.

“Activate Billing”

Then you have to find the quota reserved for the GPU, select the filters as in the image below.

Look at the quota “Compute Engine API” / “GPUs (all regions)”, it is probably at 0.

In this case, select this quota by checking the box next to it.

To access it, you must deselect everything by clicking on “None” in the “Service” drop-down menu and then select only “Compute Engine API”. Then in the drop-down menu “Metrics”, click on “None” then search GPU in the small search bar, finally select “GPUs (all regions)”.

(If your GPU quota is already 1 or more, ignore the following steps and go directly to “Create your first VM” later in the article)

Now check the box “Compute Engine API” / “GPUs (all regions)” and at the top of the page click on “Edit Quotas”

On the left of the screen you will see a new window, enter your name, your email and your phone then click on “Next”.

“Next”

Ask to set your new limit to 1, and explain why you want to raise this limit. Click OK and send your request.

Now you will have to wait for the request being approved, it will probably take a few hours. You will be notified by email that your request has been accepted.

When you have received the confirmation, you can verify that your GPU quota has been increased to 1.

Create your first Virtual Machine (VM)

Go to the navigation menu by clicking on the top left and go to VM Instances.

If you have never created a machine before

If you have already created one before

Click on the small ‘+’ at the top of the page.

Specify the characteristics of your VM

First, select a region and zone that is close to your location and have GPU available.

Quick & Dirty List of region with GPU (as of the 4th March 2019), access the detailed list here.

  • asia-east1-a
  • asia-east1-b
  • europe-west1-b
  • europe-west1-d
  • europe-west4-b
  • europe-west4-c
  • us-central1-c
  • us-central1-f
  • us-east1-c
  • us-east1-d
  • us-west1-b

Adjust number of CPUs & RAM, Then click on “customize”

“Customize”

Now add GPUs, you may want 1 Tesla P100. There is a bug on the interface, to succeed in selecting what we want we must first choose 2 x NVIDIA K80 and then select 1 Tesla P100.

You might want Ubuntu 18 and 200GB hard drive, so select “Modify”.

Select Ubuntu 18.04 LTS (and not 18.10) and change the disc size to 200GB. Then valid.

Now check “Allow HTTP traffic” and “… HTTPS”.

Once done, click on “Management, security … blablabla”

Search for “Preemptibility” and select “On”, which allows us to spend 2 times less credit.
Warning: Preemptible instances are cheaper instances that can be shutdown anytime, if you don’t have the luxury to deal with such distraction, (i.e. You are participating in a Hackathon) you might want to deactivate this feature.

“Preemptibility : On”

Go back to the top of the page to verify that the cost is calculated at about $ 0.50 per hour. If it’s much more (about 1$), you have surely forgotten to activate preemption. You can also watch there is free credit left available.

Then go down the page and click create.

Et voilà!

Remember to stop your VM when you do not use it

We will do a couple of adjustments so we will stop the machine in the meantime. Click on the three small dots at the other end of the machine name and choose “Stop”. Confirm the prompt that appears to actually stop the machine.

“Stop”

If you did well the previous step, after about twenty seconds your machine will display a gray circle.

You can never repeat it enough but do not forget to turn off your machine after each use not to waste your credits unnecessarily !!!

Configure the firewall to access Jupyter Notebook from the internet

Go to the top of the page and find out how to access the navigation menu.

In this menu chosen “VPC Network” -> “Firewall Rules”.

“VPC Network” -> “Firewall Rules”

Then click create a firewall rule.

“Create a firewall rule.”

Then fill in the settings as in the pictures below.

Name: as you want
Description: as you want
Direction: Ingress
Actions on Match: Allow

Scroll down and continue copying the settings as in the picture below. Once everything is the same click on “Create”.

Targets: All instances in the Network
Source Filter : IP ranges
Source IP ranges : 0.0.0.0/0
Protocol and Ports: Specified protocol and ports
TCP : 6969 (You can actually write a range there 6900–7000 for example, it will be useful if you want to also use Tensorboard for example, just pick something you can remember exactly)

“Create”

Start the machine

Congratulations, everything is ready, to get back to your machine, go back to the navigation menu and select “Compute Engine” -> “VM Instances”

This time click on “Start”

“Start”

Wait about a minute and the light next to your machine should light green.

Access your machine

To enter the machine click on “SSH”, a terminal window will open.

Bonus: Facilitate copying and pasting

In this window click on the parameter wheel and check “Copy Settings” -> “Copy / Paste with Ctrl-Shift-C / V”

“Copy Settings” -> “Copy / Paste with Ctrl-Shift-C / V”

Now you can easily copy / paste with Ctrl + Shift + C / V in the terminal.

Install libraries to do Deep Learning

For this part and all those that follow, copy and paste the order lines one by one.

Install GPU Driver

sudo apt-get update -ysudo apt-get upgrade -ylspci | grep -i nvidiauname -m && cat /etc/*releasesudo apt-get install linux-headers-$(uname -r) -ysudo apt autoremove -ysudo apt-get install build-essential -ysudo apt-get install cmake git unzip zip -ysudo apt-get install -y software-properties-commonsudo add-apt-repository ppa:graphics-drivers/ppa -ysudo apt install nvidia-driver-415 -y

Install Anaconda

cd ~sudo wget https://repo.continuum.io/archive/Anaconda3-2018.12-Linux-x86_64.shsudo chmod +x Anaconda3-2018.12-Linux-x86_64.shsudo chown -R $(whoami) ~

For this part after entering this command line, answer the questions as shown.

sudo ./Anaconda3–2018.12-Linux-x86_64.sh

Now we restart

Once the driver and anaconda installed we can restart.

sudo reboot

As expected, the connection will crash.

Just close the window.

Wait a minute and re-open an SSH connection.

Check that everything works so far

Enter the command line below.

nvidia-smi

And this screen should be displayed.

Set up Jupyter Notebook in Server mode

cd ~jupyter notebook --generate-configcd .jupytermv jupyter_notebook_config.py jupy.bak.pytouch jupy.pyecho 'c = get_config()' >> jupy.pyecho c.NotebookApp.ip = \'*\' >> jupy.pyecho 'c.NotebookApp.open_browser = False' >> jupy.pyecho 'c.NotebookApp.port = 6969' >> jupy.pycat jupy.bak.py >> jupy.pymv jupy.py jupyter_notebook_config.pyrm jupy.bak.pycd ~

Install the libraries: FastAI & Pytorch

conda create -n fastai pip -yconda activate fastaiconda install -c pytorch -c fastai fastai -ypip install -U pipconda install matplotlib scikit-learn numpy pandas scipy -yconda install ipykernel -ypython -m ipykernel install --user --name=fastaimkdir workcd workgit clone https://github.com/fastai/course-v3.gitconda deactivate

Install the Tensorflow & Keras libraries

cdconda create -n tf-keras tensorflow-gpu keras pipconda activate tf-kerasconda install matplotlib scikit-learn numpy pandas scipy -ypip install ipykernelpython -m ipykernel install --user --name=tf-kerasconda deactivatesudo chown -R $(whoami) ~/anaconda3

Launch the Jupyter server

NB: If you plan to leave your machine training on Jupyter unattended, use tmux before entering the command line below. Then once you have connected your browser to the Jupyter Server, detach your process using Ctrl+B->D.

tmux # Optional, see note above.jupyter notebook --no-browser --port=6969 --ip='0.0.0.0'

Copy the text that looks like ..

:6969/?token=blablablablabla....

Then go to the Google compute tab in your web browser and refresh the page.

The IP of your machine should have been displayed. Click on it.

A new page opens with “https: //bla.bla.bla”

You have to change to “http”

Then you paste the line with the port and the token as shown below.

Bravo! You have set up your jupyter server.

To create a new Jupyter Notebook, click on “Nouveau” or “New” and select the libraries you want to use:

  • “fastai” for Pytorch + Fast.AI
  • “tf-keras” for Tensorflow + Keras

Of course both are configured to use GPUs.

EXAMPLE: Run Fast.AI notebooks

If you followed the previous steps you should be able to access Fast.AI’s notebooks, click to open them.

When you have opened one you have to connect it to the conda env which contains Pytorch & Fast.AI, so, to do this “Kernel” -> “Change Kernel” -> “fastai”

You should be able to observe in the upper left that there is written “fastai”.

Now you can run the cells and interact with the code. For example, the cell below train a neural net. With this set-up, it’s super fast.

BONUS: Monitor the use of CPUs and GPUs

For those who like to put their hands in the grease, we show you here how to display the use of resources. Of course this part is optional and totally optional.

CPUs and RAM

In the main menu click on “New” and create a “Terminal”.

In the new terminal, type:

sudo apt-get install htop -yhtop

You can see the result of htop in the screenshot below.

  • The bars at the top (1, 2, …) correspond to the CPUs; there we see that during the training those are maxed.
  • The box “Mem” corresponds to the RAM, here we see that we have 50GB RAM as expected.
  • (For the curious ones) Below the memory, there is “Swp” which corresponds to the swap file, concretely if you get to fill it, your code surely has memory leaks. Or you are doing very heavy operations like building Tensorflow from the source code with an aggressive set of options.

The GPUs

Same manipulation from the main menu.

Tape in the new terminal

watch -n 0.2 nvidia-smi

The result is updated in real time and looks like the screenshot below.

  • The “xxx MiB / yyy MiB” portion represents the GPU-RAM video memory “used vs. max” . (n.b. TensorFlow has, for some versions, the villain way to book ALL video memory, so no worries if you ever see a tiny model that eats all the GPU-RAM available when using TF.)
  • The “xx%” part represents the use of the computing power of the GPU.

Happy Coding ;)

--

--

Rémi Connesson

I use Roam + Notion along my way of building expertise in marketing and ML ops. You can tag along the journey here https://tinyletter.com/remiconnesson :)