Tuesday, December 4, 2018

An intro to my local Python runtime environment for Experiments

1. Motivation

I want to share in this blog post the structure of my experiment environment which I use to mainly train my deep learning models, but I also more and more often adopt in other cases too.

2. PipEnv
It is never a good idea to use the bare Python interpreter to run your experiments. Over time installed packages could lead to conflicts. Solutions such as virtualenv allow for creating multiple instances of the Python interpreter where different packages can be installed, i.e. different Python runtime environments can be reproduced. One further possibility is to use PipEnv which further wraps Pip and VirtualEnv, to automatically manage virtualenv instances and specify all pip dependencies in a pipfile which is shipped along with the code and can therefore be used to easily reproduce the environment on a different machine. 

I used PipEnv when running in a Docker container was a worse or not easily possibile solution, such as when using the OpenAI Gym, which requires a bunch of system (i.e., mainly for graphics and physics simulation) packages.


3. Docker

I use Docker every time I can. For tensorflow I generally use the tensorflow-gpu version, which exploits Cuda and cudnn to run code on my NVIDIA GTX1080 graphics card. I extended the image with this Dockerfile to add further dependencies which I need daily:
  • Dask, to do scalable data preprocessing and feature extraction, when pandas is not enough to deal with large datasets, and thinking to start with Spark makes me itchy all over.
  • Keras for obvious reasons, but can be anything comparable
  • livelossplot to have interactive plotting of models being trained
  • tables to have hdf5 support
  • tqdm to see a progress of my experiment
  • Jupyterlab, an amazing upgrade of the well-known Jupyter, have a look at it shal you not know it yet.

I then just build everything up with:
docker build -t pilillo/deep_learning_workspace:0.1 -f Dockerfile .

I finally mount a local volume to make sure my code stays on my hard drive (and can be easily committed to git), and I constrain memory resources to make sure nothing will be monopolizing my system (I got a Ryzen 2700x with 16 threads, and 64GB RAM):

docker run --runtime=nvidia -it -p 8888:8888 \
-v /home/pilillo/Documenti/deep_learning_workspace/workspace:/notebooks/workspace \
--memory="60g" --memory-swap="60g" \
pilillo/deep_learning_workspace:0.1 

Many might say it is a better idea to use docker-compose, especially to interact with Dask. I however believe with my single-node setup my deveopment process would not actually benefit from this and believe separating the services would just be for a philosophical matter rather than actual performance improvement. This is therefore all I need for the time being to quickly run my experiments locally.

Andrea

No comments:

Post a Comment