A Docker container for Biopython

In this post we will create a docker container for Biopython. Our final objective is to have a container to test Biopython (a different kind of beast compared with what we are doing here), but this one might actually be interesting for a lot more people. For this we will use Docker.

A Caveat: Docker is undergoing intense development thus some of the suggestions below might break with time. If you find such a case, please inform me and I will amend this post. I will assume that you have installed Docker and that your user has group permissions to interact with Docker (if not, then just sudo most of the commands below).

For the impatient
Install docker. Remember: depending on your installation you might need to add sudo to the commands below.
docker build -t biopython https://raw.githubusercontent.com/tiagoantao/my-containers/master/biopython/Biopython3
#Grab a coffee, wait a bit
docker run -t -i biopython /bin/bash

Creating a Docker file

Basic stuff

We will use Ubuntu, most specifically Ubuntu Saucy. Why Saucy? For no specific reason, but we want to make sure that the environment is stable, so we pick a recent-but-not-bleeding-edge distro. So, our file starts with:

FROM ubuntu:saucy

Which simple uses Saucy (downloading the image if necessary)

We now add all the ubuntu standard packages needed for Biopython:

#We need this for phylip
RUN echo 'deb http://archive.ubuntu.com/ubuntu precise multiverse' >> /etc/apt/sources.list
RUN apt-get update
RUN apt-get install -y git python-numpy wget gcc python-dev
RUN apt-get install -y python-matplotlib python-reportlab python-rdflib
RUN apt-get install -y clustalw fasttree t-coffee
RUN apt-get install -y bwa ncbi-blast+ emboss clustalo phylip mafft muscle
RUN apt-get instally -y embassy-phylip samtools phyml wise raxml
# For BioSQL
RUN apt-get install -y mysql-server python-mysqldb postgresql python-psycopg2

Notice the change of repositories and all support packages (git, gcc, ...)

Non-standard packages

There are several pieces of software that require manual installation. It is an ongoing task, but it is mostly simple grunt work, for example:

#reportlab fonts
RUN wget http://www.reportlab.com/ftp/fonts/pfbfer.zip
WORKDIR cd /usr/lib/python2.7/dist-packages/reportlab
RUN  mkdir fonts
WORKDIR cd /usr/lib/python2.7/dist-packages/reportlab/fonts
RUN unzip /pfbfer.zip
RUN rm pfbfer.zip
RUN mkdir genepop
WORKDIR /genepop
RUN wget http://kimura.univ-montp2.fr/~rousset/sources.tar.gz
RUN tar zxf sources.tar.gz
RUN g++ -DNO_MODULES -o Genepop GenepopS.cpp -O3
RUN cp Genepop /usr/bin
RUN rm -rf genepop

Not much more than a sequence of bash commands in all the cases that I have done (download stuff, compile, copy, cleanup, ...).

Configuring and starting services (DBs)

Here we need to configure the databases needed for BioSQL (PostgreSQL and MySQL - sqlite is ready). The configuration looks like this:

RUN echo "host    all             all             ::1/128                 trust" > /etc/postgresql/9.1/main/pg_hba.conf
RUN echo "service postgresql start" > .bashrc
RUN echo "service mysql start" >> .bashrc

We the need to configure permissions access to the postgreSQL server. Notice that the address is a IPv6 one. Something in the system (I did not research what) is doing IPv6 first (localhost has both a v4 and v6 address). Modern: yes, welcome: yes, expected: no. So, if something based on localhost seems to be failing check if it is using IPv6.

The Database servers are started in .bashrc. This solution is, in my view, sub-optimal (for instance you can run a container without starting with bash, and there goes database server initialization). If you know of a better way, please say...

Preparing Biopython

It is actually quite easy:

RUN git clone https://github.com/biopython/biopython.git
WORKDIR /biopython
RUN python setup.py install

Running and getting the Docker file

If you want to run this do, on your machine (with docker, preferably with the sudo issue resolved):

docker build -t biopython https://raw.github.com/tiagoantao/my-containers/master/Biopython
docker run -i -t biopython /bin/bash

You will see a few errors related to database startup, but these are not important in this context.

You can now do, for example:

root@dc9d8c3c48f8:/biopython# cd Tests/
root@dc9d8c3c48f8:/biopython/Tests# python run_tests.py --offline

Grab the docker file here, if you want to look at it.

Next steps

Next step will be the creation of a buildbot docker for Biopython. Also finalize the list of dependencies (almost done).