Unlocking the Power of Docker
Getting you started using Docker and containers for data and machine learning engineers
👋 Hi, this is Sarah with the weekly issue of the Dutch Engineer Newsletter. In this newsletter, I cover data and machine learning concepts, code best practices, and career advice that will help you accelerate your career.
In 2022’s StackOverflow survey, 69% out of 46,432 professional developer responses said that they used Docker extensively in the past year, increasing from 55% in 2021.1
StackOverflow respondents are mostly software engineers, but new data tools also rely heavily on the usage of Docker and Kubernetes. Consider Prefect a rival to Airflow for data pipeline workflows, as well as Flyte and BentoML, which both leverage Kubernetes and Docker to support machine learning infrastructure. As data and machine learning engineers, I highly recommend you learn what docker is all about.
Docker is a tool that creates and runs lightweight, portable, and self-contained environments for applications. Dockerfiles, which are essentially scripts, contain instructions for building Docker images and containers. They are a powerful tool for creating custom images that contain all the necessary software and configurations to run an application.
This article will guide you through the process of installing Docker, explaining the Dockerfile, and integrating your container with VSCode for a more streamlined container development experience.
Thanks to Delta for sponsoring this newsletter! I am a huge fan of Delta Lake and use it every day both in Data Engineering and Machine Learning.
Install Docker
First, we will have to install Docker. The easiest way to do that is to:
go to the official Docker website (https://www.docker.com/) and click on the "Get Started" button.
Choose your platform. Docker supports various platforms, including Windows, macOS, and Linux. Choose the platform that is appropriate for your system.
Download Docker. Once you have selected your platform, you will be taken to a download page. Follow the instructions on the page to download the Docker installer for your platform.
After the installation is complete, open a terminal or command prompt and run the command:
docker version
to verify that Docker has been installed correctly. You should see an output that shows the version of Docker installed on your system.
Client:
Cloud integration: v1.0.24
Version: 20.10.14
API version: 1.41
Go version: go1.16.15
Git commit: a224086
Built: Thu Mar 24 01:49:20 2022
OS/Arch: darwin/amd64
Context: default
Experimental: true
Server: Docker Desktop 4.8.2 (79419)
Engine:
Version: 20.10.14
API version: 1.41 (minimum version 1.12)
Go version: go1.16.15
Git commit: 87a90dc
Built: Thu Mar 24 01:46:14 2022
OS/Arch: linux/amd64
Experimental: false
containerd:
Version: 1.5.11
GitCommit: 3df54a852345ae127d1fa3092b95168e4a88e2f8
runc:
Version: 1.0.3
GitCommit: v1.0.3-0-gf46b6ba
docker-init:
Version: 0.19.0
GitCommit: de40ad0
Congratulations, you have now installed Docker on your system! You can now use Docker to create, deploy, and run containers for your applications.
Anatomy of a Dockerfile
Docker uses a Dockerfile to build an image. A Dockerfile is a plain text file that contains a series of instructions for building a Docker image. The file should be named Dockerfile and should be located in the root directory of your project. Note that you can change the extensions to identify different Dockerfiles e.g. Dockerfile.stg for staging environments and Dockerfile.prod for production environments. Here's an example Dockerfile that I use for example.dutchengineer.org, my original website where I show two of my projects):
# Use the official lightweight Python image.
# <https://hub.docker.com/_/python>
FROM python:3.7-slim
# Allow statements and log messages to immediately appear in the Knative logs
ENV PYTHONUNBUFFERED True
# Copy local code to the container image.
ENV APP_HOME /app
WORKDIR $APP_HOME
COPY . ./
RUN apt-get -y update \\
&& apt install unzip
RUN unzip -o data.zip
# Install production dependencies.
RUN pip install --trusted-host pypi.python.org -r requirements.txt
# Run the web service on container startup. Here we use the gunicorn
# webserver, with one worker process and 8 threads.
# For environments with multiple CPU cores, increase the number of workers
# to be equal to the cores available.
CMD exec gunicorn --bind :$PORT --workers 1 --threads 8 --timeout 0 app:server
Let's go through each of these instructions:
FROM: This instruction specifies the base image to use for your image. In this example, we're using the official lightweight Python image.
ENV: This instruction sets environmental variables in the Dockerfile.
WORKDIR: This instruction sets the working directory for the rest of the instructions in the Dockerfile. In this example, we're setting the working directory to /app.
COPY: This instruction copies files from the host machine into the Docker image. In this example, we're copying the package.json and package-lock.json files into the image.
RUN: This instruction runs a command in the Docker image. In this example, we are installing Python files.
CMD: This instruction specifies the command that should be run when a container is started from the image. In this example, we're running a Gunicorn web server.
If you need to go more in-depth or need other ones, I highly recommend you check out their documentation here.
Building an Image
To build an image from a Dockerfile, you need to use the docker build command. The docker build command takes the path to the directory that contains the Dockerfile as an argument.
docker build .
This will build an image with the tag latest from the Dockerfile located in the current directory (.).
In order to change the tag (which is best practice!), we will want to run
docker build -t myimage:v2 .
Using the Image
Once you've built an image, you can use it to create a container. To create a container from an image, you need to use the docker run command.
docker run -p 3000:3000 my-image:v2
This will start a container from the my-image image and map port 3000 in the container to port 3000 on the host machine.
Using the following command, you would be able to see the container running.
docker ps
VSCode integrations with Docker
As a new Docker user years ago, I often struggled to develop with Dockerfiles as I had to switch between multiple screens. However, since then, VSCode has added the ability to run and debug applications inside a Docker container directly from the VSCode interface. This has made me a much more efficient engineer, and it's all thanks to the "Docker" and "Dev Containers" extensions, which can be installed from the VSCode marketplace.
Visual Studio Code offers a video tutorial on how to open your repository in the remote container environment. You can find the tutorial below.
and their documentation can be found here.
Final Thoughts
In this article, I introduced Docker and Dockerfiles, explained how to get started using them, and demonstrated how you can improve your overall environment by setting up a remote environment for your container within VSCode.
What should you do next? Start building an application, such as scraping data off a website, within a Docker container!
https://survey.stackoverflow.co/2022/#most-popular-technologies-tools-tech-prof
Great article! 👏
First time I used Docker was for getting an ML model into an AWS Lambda custom container 😅