Using multi-stage containers for C++ development
Updated January 10, 2020: Corrected link to article source that was broken by refactoring in the repo.
Containers are a great tool for configuring reproducible build environments. It’s fairly easy to find Dockerfiles that provide various C++ environments. Unfortunately, it is hard to find guidance on how to use newer techniques like multi-stage builds. This post will show you how you can leverage the capabilities of multi-stage containers for your C++ development. This is relevant to anyone doing C++ development regardless what tools you are using.
Multi-stage builds are Dockerfiles that use multiple FROM statements where each begins a new stage of the build. You can also name your build stages and copy output from early stages into the later stages. Prior to the availability of this capability it was common to see build container definitions where output was copied to the host and later copied into deployment containers. This spread the definition of related containers across multiple Dockerfiles that were often driven together via scripts. Multi-stage builds are a lot more convenient than that approach and less fragile. To see a full example of before and after multi-stage builds were available I recommend looking at the official Docker multi-stage build documentation.
Let’s look at a multi-stage build Dockerfile for a C++ app. This is an app that exposes a service to receive an image, processes the image using OpenCV to circle any found faces, and exposes another endpoint to retrieve the processed image. Here is a complete multi-stage Dockerfile that produces a build container for compiling the application, followed by a runtime container that takes that output and only has the dependencies necessary for running the application as opposed to building it. Here is the source for this article.
FROM alpine:latest as build LABEL description="Build container - findfaces" RUN apk update && apk add --no-cache \ autoconf build-base binutils cmake curl file gcc g++ git libgcc libtool linux-headers make musl-dev ninja tar unzip wget RUN cd /tmp \ && wget https://github.com/Microsoft/CMake/releases/download/untagged-fb9b4dd1072bc49c0ba9/cmake-3.11.18033000-MSVC_2-Linux-x86_64.sh \ && chmod +x cmake-3.11.18033000-MSVC_2-Linux-x86_64.sh \ && ./cmake-3.11.18033000-MSVC_2-Linux-x86_64.sh --prefix=/usr/local --skip-license \ && rm cmake-3.11.18033000-MSVC_2-Linux-x86_64.sh RUN cd /tmp \ && git clone https://github.com/Microsoft/vcpkg.git -n \ && cd vcpkg \ && git checkout 1d5e22919fcfeba3fe513248e73395c42ac18ae4 \ && ./bootstrap-vcpkg.sh -useSystemBinaries COPY x64-linux-musl.cmake /tmp/vcpkg/triplets/ RUN VCPKG_FORCE_SYSTEM_BINARIES=1 ./tmp/vcpkg/vcpkg install boost-asio boost-filesystem fmt http-parser opencv restinio COPY ./src /src WORKDIR /src RUN mkdir out \ && cd out \ && cmake .. -DCMAKE_TOOLCHAIN_FILE=/tmp/vcpkg/scripts/buildsystems/vcpkg.cmake -DVCPKG_TARGET_TRIPLET=x64-linux-musl \ && make FROM alpine:latest as runtime LABEL description="Run container - findfaces" RUN apk update && apk add --no-cache \ libstdc++ RUN mkdir /usr/local/faces COPY --from=build /src/haarcascade_frontalface_alt2.xml /usr/local/faces/haarcascade_frontalface_alt2.xml COPY --from=build /src/out/findfaces /usr/local/faces/findfaces WORKDIR /usr/local/faces CMD ./findfaces EXPOSE 8080
The first section of the Dockerfile describes the build environment for our application. We’ve used the AS keyword on the FROM line to identify this stage of the build so we can refer to in subsequent stages. The first line invokes the package manager and pulls down packages like build-essential for compilers and the dev packages for the libraries I need. We are pulling in vcpkg to get our library dependencies. The COPY line copies the host machine src folder into the container. The final RUN statement builds the application using CMake. There is no entry point for this container because all we need it to do is run once and build our output.
The second FROM statement begins the next section of our multi-stage build. Here our dependencies are reduced as we don’t need the compilers or dev packages. Since we statically linked our libraries we don’t even need those, but we do need the GCC libs since we built with GCC and Alpine uses musl C. Note the COPY lines that use –from=build. Here we are copying content from the build container without needing to export to the host and copy back into our runtime container. The CMD statement starts our service in the container, and the EXPOSE statement documents the port the container is using for the service.
You could just go ahead and build this container. If you do you can tag the image, but only the final image is tagged; the images from the earlier stages of the multi-stage build are not. If you’d like to use these earlier images as the base for other containers you can tag by specifying the target of the build stage you’d like to stop at. For example, to just run the build stage I can run this command to stop there and tag the image:
docker build --target build -t findfaces/build .
If I want to run the entire build I can run the following command and tag the image from the final stage. If you ran the earlier stage already the image cache for it will be used.
docker build -t findfaces/run .
Now to use my runtime container:
docker run -d --rm -p 8080:8080 --name findfaces findfaces/run
This command runs a container based on the findfaces/run image, detach from it when it starts (-d), remove the container when it stops (–rm), expose port 8080 mapped to the same port on the container (-p), and name the container findfaces.
Now that the container is up I can access the service using curl:
curl -X PUT -T mypicture.jpg localhost:8080/files?submit=picture.jpg curl -X GET localhost:8080/files/facespicture.jpg > facespicture.jpg
If there were faces in the image our OpenCV library could identify they are circled in the output image, which has the name used in the submission prepended with “faces”.
When we are done with the application, we can stop it and the container is deleted:
docker stop findfaces
Alpine vs Debian
Above we used Alpine Linux as our base. This is a very small Linux distro that has some differences from more common ones like Debian. If you examine the Dockerfile you’ll notice we copied a file, x64-linux-musl.cmake, from the src directory into the vcpkg triplets directory. That’s because Alpine uses musl c instead of GCC, so we created a new triplet to use with musl c. This is experimental which is why we have not brought it into vcpkg directly yet. One limitation we found with this triplet is that boost-locale does not compile with it today. This caused us to switch some of our libraries, notably to use the restinio http library which was one of the few http ones we could find that would compile with this triplet. We have provided an alternate Dockerfile that targets Debian instead and does not use any experimental features.
So why would you want to try Alpine instead of Debian today? If we look at our images, this is what we see for the image sizes when using Debian.
docker image ls REPOSITORY TAG IMAGE ID CREATED SIZE findfaces/run latest 0e20b1ff7f82 2 hours ago 161MB findfaces/build latest 7d1675936cdd 2 hours ago 6.5GB
You can see our build container is much larger than the runtime container, obviously desirable for optimizing our resource usage. Our application is 40MB, so the base image we’re running on is the remaining 121MB. Let’s compare that to Alpine.
REPOSITORY TAG IMAGE ID CREATED SIZE findfaces/run latest 0ef4c0b68551 2 hours ago 50.1MB findfaces/build latest fa7fe2783c58 2 hours ago 5.57GB
Not much savings for the build container, but the run container is only 10MB larger than our application.
This post showed how to use C++ code in containers. We used a multistage build container where we compiled our assets in one container and consumed those in a runtime container in a single Dockerfile. This results in a runtime container optimized for size. We also used vcpkg to get the latest versions of the libraries we are using rather than what is available in the package manager. By statically linking those libraries we reduced the complexity of what binaries we need to manage in our runtime container. We used the Alpine distribution to further ensure that our final runtime container was as small as possible with a comparison to a Debian based container. We also exposed our application as a service using an http library to access it outside the container.
We are planning to continue looking at containers in future posts. We will have one up soon showing how to us Visual Studio and VS Code with the containers from this post. We will follow that showing how to deploy these containers to Azure. We will also revisit this application using Windows containers.
Give us feedback
We’d love to hear from you about what you’d like to see covered in the future about containers. We’d love it even more to see the C++ community producing own content about using C++ with containers. There is very little material out there today and we believe the potential for C++ in the cloud with containers is huge.
As always, we welcome your feedback. We can be reached via the comments below or via email (firstname.lastname@example.org). If you encounter other problems or have a suggestion for Visual Studio please let us know through Help > Send Feedback > Report A Problem / Provide a Suggestion in the product, or via Developer Community. You can also find us on Twitter (@VisualC).