Using containerized TensorFlow with PyCharm

Mikołaj Koziarkiewicz
SoftwareMill Tech Blog
5 min readAug 14, 2020

--

Photo by chuttersnap on Unsplash

EDIT 2020–10–13: added missing instructions for setting the GPU-accelerated Docker runtime in Enabling GPU acceleration.

Introduction

Previously, we set up a customized container for running Tensorflow-based Machine Learning computations. While we have relied on Jupyter notebooks for setting up our code, this time we’re going to use an IDE. We will be using PyCharm Professional as the demonstrative example, however the corresponding functionalities also appear to be made available through the Python plugin of IDEA Ultimate Edition.

NOTE: Unfortunately, most features required for this tutorial, including setting up a Docker-based Python interpreter, and the Scientific view functionalities, are restricted to the paid-for versions of the software. Know of any popular IDEs with equivalent features? Please let us know in the comments!

Setting up the project

For starters, let’s run the New Project action. For the project category, we’ll be going with “Scientific”. The tricky part is defining the interpreter. On project creation, we select the system one, and not a remote one. This is because, for some reason, PyCharm will not let us create a project with a Docker interpreter outright (perhaps because Docker interpreter support is a relatively new feature).

We will subsequently be greeted with a new project window. Now we can define the final interpreter.

Once there, select Docker, and the desired image from the ones present in your local Docker image store. Building on the “Rolling your own custom container section” from the previous entry, we’re going to be using a custom container (without the optional Jupyter support):

However, you can also choose the standard tensorflow/tensorflow-gpu one.

The IDE should now load the installed package info, and detect the Python version.

One thing left for the interpreter setting— to properly launch scripts, PyCharm needs a default project path mapping:

Setting the local path to the project root, and the remote path to an arbitrary one is sufficient.

Restart the console (at the bottom) to enter the updated interpreter. The project is now almost ready to run.

Enabling GPU acceleration

For this step, due to PyCharm’s lack of Docker runtime parameter for interpreters in Scientific Mode (more on that later), we will need to edit our default Docker runtime.

IMPORTANT: this setting affects all your future Docker container runs.

First, find the path to nvidia-container-runtime (assuming you installed it for the previous blog entry). For Debian-based distros, this should be /usr/bin/nvidia-container-runtime .

Next, edit /etc/docker/daemon.json to have the following form (example for Debian-based distros) :

After saving its config file, restart the Docker daemon.

Testing out the config

We are again going to be using the MNIST tutorial as a base.

First up, open main.py, which should be generated during project setup. Let’s fire up the GPU setup code:

Note the #%% - this is a marker for a new code cell, similarly as in Jupyter.

You’ll notice that apart from the logging, the defined variables appear on the right-hand side of the Python console, available for inspection.

Let’s follow the tutorial in the subsequent cells:

As before, you should witness the training-per-epoch output.

Data handling

Finally, let us take an advantage of PyCharm’s Scientific mode, and do some graph plotting. Specifically, let’s plot the training parameters available in the History object:

Unfortunately, if we’d try it without any prep, we would immediately hit a snag — JetBrains’ software has several outstanding issues that prevent the operation of the plotting functionality without workarounds. While, ostensibly, there is support for Docker runtime parameters since 2020.2, the option is effectively unavailable for Scientific mode, and usable only for “batch runs”.

As it is evident from the bug reports, the core issue is a lack of connectivity between the IDE and matplotlib. Normally, we would solve this problem by using appropriate initialization arguments for the Docker container, but there does not appear to be a way to do that in the remote interpreter settings.

What we can do, however, is to create a Docker Compose file, with a sufficient workaround:

Next, we set up a Docker Compose remote interpreter like so:

Now we wait for the indexing to finish, restart the Python console, and add one more code cell:

and we should end up with a corresponding graph on the Plots view pane:

Conclusions

We have built upon the previous blog entry, trading a reliance on Jupyter for a traditional IDE. We have thus gained features such as an arguably improved code completion and documentation access, as well as, potentially, an augmented sense of familiarity with the tools being used.

As is apparent from our little demo, remote interpreter support is still in the process of being fully integrated into the tooling. Nevertheless, it already works relatively well. As a fallback, it’s always possible to have two projects — one for data analysis and preprocessing (using a local interpreter), and one for training (using a remote interpreter).

Nevertheless, do take a look at PyCharm’s feature set and the associated tutorial for more information.

--

--