Since Docker 17.05, there is support for multi-stage builds. This is an example and tutorial for using it to build simple Haskell webapp. The setup is simple: single Dockerfile, yet the resulting docker image is only megabytes large.
Essentially, I read through Best practices for writing Dockerfiles and made a Dockerfile.
A word of warning: If you think Nix is the tool to use, I'm fine with that. But there's nothing for you in this tutorial. This is an opinionated setup: Ubuntu and cabal-install's nix-style build. Also all non-Haskell dependencies are assumed to be avaliable for install through apt-get. If that's not true in your case, maybe you should check Nix.
The files are on GitHub: phadej/docker-haskell-example. I refer to files by names, not paste them here.
Assuming you have docker tools installed, there are seven steps to build a docker image:
Write your application. Any web-app would do. The assumptions are that the app
I use a minimal servant app: docker-haskell-example.cabal and Main.hs. If you want to learn about servant, its tutorial is a good starting point.
Write cabal.project containing at least
index-state: 2019-06-17T09:52:09Z
with-compiler: ghc-8.4.4
packages: .
index-state makes builds reproducible enoughwith-compiler select the compiler so it's not the default ghcAdd .dockerignore. The more stuff you can ignore, the better. Less things to copy to docker build context. Less things would invalidate docker cache. Especially hidden files are not hidden from docker, like editors' temporary files. I hide .git directory. If you want to burn git-hash look at known issues section.
Add Dockerfile and docker.cabal.config. docker.cabal.config is used in Dockerfile. In most cases you don't need to edit Dockerfile. You need, if you need some additional system dependencies. The next step will tell, if you need something.
Build an image with
docker build --build-arg EXECUTABLE=docker-haskell-example --tag docker-haskell-example:latest .If it fails, due missing library, see next section. You'll need to edit Dockerfile, and iterate until you get a successful build.
After successful build, you can run the container locally
docker run -ti --publish 8000:8000 docker-haskell-example:latestThis step is important, to test that all runtime dependencies are there.
And try it from another terminal
curl -D - localhost:8000It should respond something like:
HTTP/1.1 200 OK
Transfer-Encoding: chunked
Date: Thu, 04 Jul 2019 16:15:37 GMT
Server: Warp/3.2.27
Content-Type: application/json;charset=utf-8
["hello","world"]The Dockerfile is written with a monorepo setup in mind. In other words setup, where you could build different docker images from a single repository. That explains the --build-arg EXECUTABLE= in a docker build command. It's also has some comments explaining why particular steps are done.
There are two stages in the Dockerfile, builder and deployment.
In builder stage we install all build dependencies, separating them into different RUNs, so we could avoid cache invalidation as much as possible. A general rule: Often changing things have to installed latter.
We install few dependencies from Ubuntu's package repositories. That list is something you'll need to edit once in a while. The assumption is that all non-Haskell stuff comes from there (or some PPA). There's also a corresponding list in deployment stage, there we install only non-dev versions.
# More dependencies, all the -dev libraries
# - some basic collection of often needed libs
# - also some dev tools
RUN apt-get -yq --no-install-suggests --no-install-recommends install \
build-essential \
ca-certificates \
curl \
git \
libgmp-dev \
liblapack-dev \
liblzma-dev \
libpq-dev \
libyaml-dev \
netbase \
openssh-client \
pkg-config \
zlib1g-devAt some point we reach a point, where we add *.cabal file. This is something you might need to edit as well, if you have multiple cabal files in different directories.
# Add a .cabal file to build environment
# - it's enough to build dependencies
COPY *.cabal cabal.project /build/We only add these, so we can build dependencies.
# Build package dependencies first
# - beware of https://github.com/haskell/cabal/issues/6106
RUN cabal v2-build -v1 --dependencies-only alland their cache won't be violated by changes in the actual implementation of the webapp. This is common idiom in Dockerfiles. Issue 6106 might be triggered if you vendor some dependencies. In that case change the build command to
RUN cabal v2-build -v1 --dependencies-only some-dependencieslisting as many dependencies (e.g. servant, warp) as possible.
After dependencies are built, the rest of the source files are added and the executables are built, stripped, and moved to known location out of dist-newstyle guts.
The deployment image is slick. We pay attention and don't install development dependencies anymore. In other words we install only runtime dependencies. E.g. we install libgmp10, not libgmp-dev. I also tend to install curl and some other cli tool to help debugging. In deployment environments where you can shell into the running containers, it helps if there's something you can do. That feature is useful to debug network problems for example.
The resulting image is not the smallest possible, but it's not huge either:
REPOSITORY TAG SIZE
docker-haskell-example latest 137MBCold build is slow. Rebuilds are reasonably fast, if you don't touch .cabal or cabal.project files.
If you have data-files, situation is tricky: Consider using file-embed-lzma or file-embed packages. I.e. avoid data-files.
Cabal issue #6106 may require you to edit --dependencies-only build step, as explained above.
Git Hash into built executable. My approach is to ignore whole .git directory, as it might grow quite large. Maybe uningoring (with !) of .git/HEAD and .git/refs (which are relatively small) will make gitrev and a like work. Please tell me if you try!
Caching of Haskell dependencies is very rudimentary. It could be improved largely, if /cabal/store could be copied out after the build, and in before the build. I don't really know how to that in Docker. Any tips are welcome. For example with docker run one could use volumes, but not with docker build.
Look at the example repository. I hope this is useful for you.