From Chaos to Consistency: Docker for Data Scientists | By Igor Howell

[ad_1]

Introduction and Applications of Docker for Data Scientists

But does it work on my machine?

It’s a classic meme in the tech community, especially for data scientists who want to ship their amazing machine-learning models, only to find that the production machine has a different operating system. Far from ideal.

however…

there is a solution for these wonderful things called containers and tools to control them such as postal worker,

In this post, we will learn what are containers and how you can create and run them using Docker. The use of containers and Docker for data products has become an industry standard and common practice. As a data scientist, learning these tools is an invaluable tool in your arsenal.

Docker is a service that helps build, run, and execute code and applications in containers.

Now you must be wondering what is a container?

Clearly, a container is similar to a Virtual Machine (VM), It is a small isolated environment where everything is ‘self contained’ and can be run on any machine. The primary selling point of containers and VMs is their portability, which allows your applications or models to run seamlessly on any on-premises server, local machine, or cloud platform such as AWS,

The main difference between containers and VMs is how they use their host computer resources. Containers are much more lightweight because they do not actively partition the hardware resources of the host machine. I won’t go into full technical details here, however I’ve linked a great article explaining their differences if you want to understand a bit more.

Docker is just a tool that we use to easily create, manage and run these containers. This is one of the main reasons why containers have become so popular, as it enables developers to easily deploy applications and models anywhere they run.

To run a container using Docker we need three main elements:

Dockerfile: A text file containing instructions for building Docker. image
docker image, A blueprint or template for building Docker containers.
Docker container: An isolated environment that provides everything needed to run an application or machine learning model. Things like dependencies and OS versions are included.

There are also some other key points to note:

Docker Daemon: a background process (demon) that deals with incoming requests to Docker.
Docker Client: A shell interface that enables the user to talk to Docker through its daemon.
dockerhub, Similar to GitHun, a place where developers can share their Docker images.

homebrew

The first thing you should install is homebrew (link here). It’s dubbed as ‘the missing package manager for MacOS’ and is very useful for anyone coding on their Mac.

To install Homebrew, simply run the command provided on their website:

/bin/bash -c "$(curl -fsSL

Verify that Homebrew is installed by running brew help,

postal worker

Now with Homebrew installed, you can install Docker by running brew install docker, Verify that docker is installed by running which docker The output should not contain any errors and look like this:

/opt/homebrew/bin/docker

Kolyma

last part, is it installed Kolyma, run only install colima and verify that it is installed which colima, Again, the output should look like this:

/opt/homebrew/bin/colima

Now you must be wondering what is Kolyma?

Kolyma is a software package that enables container runtime on MacOS. In more general terms, Kolyma creates an environment for containers to work on our systems. To achieve this, it runs a Linux virtual machine demon Docker can communicate using client-server model,

Alternatively, you can also install docker desktop instead of Kolyma. However, I prefer Colima for a few reasons: it’s free, more lightweight and I like working in the terminal!

For more arguments for Colima see this blog post here

workflow

Below is an example of how data scientists and machine learning engineers can deploy their models using Docker:

The first step is obviously to build their amazing model. Then, you need to wrap all the stuff used to run the model, like the Python version and package dependencies. The last step is to use that require file inside Dockerfile.

If this seems completely arbitrary to you at this point don’t worry, we’ll go through the process step by step!

original model

Let’s start by creating a basic model. The code snippet provided demonstrates a simple implementation of this random forest Classification model on the famous iris dataset:

Dataset from Kaggle with CC0 license.

GitHub Gist by the author.

this file is called basic_rf_model.py for reference.

create requirements file

Now that our model is ready, we need to make a requirement.txt File to hold all the dependencies underpinning the running of our model. In this simple example, we fortunately only rely on scikit-learn package. Therefore, our requirement.txt It’ll just look like this:

scikit-learn==1.2.2

You can check the version running on your computer scikit-learn --version Permission.

create Dockerfile

Now we can finally build our Dockerfile!

So, in the same directory as requirement.txt And basic_rf_model.pycreate a file named Dockerfile, Inside Dockerfile We would have the following:

GitHub Gist by the author.

Let’s go line by line to see what it means:

FROM python:3.9, this is the base image for our image
MAINTAINER egor@some.email.com, Indicates who maintains this image
WORKDIR /src, sets the working directory of the image to be src
COPY . ., copy current directory files to docker directory
RUN pip install -r requirements.txt, install requirements from requirement.txt file in docker environment
CMD ("python", "basic_rf_model.py"), tells the container to execute the command python basic_rf_model.py run more models

start colima and docker

The next step is setting up the Docker environment: First we need to boot Kolyma:

colima start

After colima is started, check that Docker is working by running the command:

docker ps

It should return something like this:

CONTAINER ID   IMAGE     COMMAND   CREATED   STATUS    PORTS     NAMES

This is good and means that both Colima and Docker are working as expected!

Comment: The docker ps The command lists all currently running containers.

create image

Now it’s time to build our first Docker image Dockerfile which we have created above:

docker build . -t docker_medium_example

-t The flag indicates the name of the image and . tells us to build from this current directory.

if we run now docker imagesWe should see something like this:

Congratulations, the image is created!

run container

Once the image is created, we can run it as a container using IMAGE ID listed above:

docker run bb59f770eb07

Output:

Accuracy: 0.9736842105263158

Cause it’s all done basic_rf_model.py script!

Additional Information

This tutorial is just scratching the surface of what Docker can do and be used for. There are many more features and commands to learn in order to understand Docker. There is a very detailed tutorial on the Docker website which you can find here.

A nice feature is that you can run the container in interactive mode and go into its shell. For example, if we run:

docker run -it bb59f770eb07 /bin/bash

You will enter the Docker container and it should look something like this:

we also used ls Command to show all files in Docker working directory.

Docker and containers are great tools for making sure data scientists’ models can run anywhere and anytime without any issues. They do this by creating small isolated compute environments that contain everything the model needs to run effectively. This is called a container. It is easy to use and lightweight, which renders it a common industrial practice nowadays. In this article, we looked at a basic example of how to package your model into a container using Docker. The process was simple and seamless, so is something data scientists can learn and pick up quickly.

The full code used in this article can be found on my GitHub here:

(all designed by emoji openmoji – Open-source emoji and icon project. License: CC BY-SA 4.0,

T-Mobile has announced the rollout of its high-speed 5G community, boasting speeds of up to 3Gbps:

OpenAI can’t tell if something become written by using AI in any case

Google’s CFO just got promoted

How Google’s latest AI model is generating music from your brain activity

Easy Rider to Midnight Run, The Greatest Roadtrips Movies of All Time

Three new Starfield animated shorts offer more glimpses of Bethesda’s new universe

Trending Tags

World IVF Day: Infertility is a silent epidemic – why is it important to tackle fertility problems? experts tell

What is ‘duck walk’ in old age? Expert shares tips on maintaining normal mobility

Radiohead brands portfolio expands with the launch of Hustle™ energy drink. Unveiled through new campaign “Dreams are free, #HustleModeOn for everything else – Food Marketing Technology”

From Chris Gayle to Virat Kohli: Most runs scored by players in India vs West Indies ODI series

Infertility Treatment: How Ayurveda Can Help Increase Fertility? experts tell

Ishant Sharma opens up about the truth behind Zaheer Khan’s Test retirement and the allegations against Virat Kohli

Trending Tags

T-Mobile has announced the rollout of its high-speed 5G community, boasting speeds of up to 3Gbps:

OpenAI can’t tell if something become written by using AI in any case

Google’s CFO just got promoted

How Google’s latest AI model is generating music from your brain activity

Easy Rider to Midnight Run, The Greatest Roadtrips Movies of All Time

Three new Starfield animated shorts offer more glimpses of Bethesda’s new universe

Trending Tags

World IVF Day: Infertility is a silent epidemic – why is it important to tackle fertility problems? experts tell

What is ‘duck walk’ in old age? Expert shares tips on maintaining normal mobility

Radiohead brands portfolio expands with the launch of Hustle™ energy drink. Unveiled through new campaign “Dreams are free, #HustleModeOn for everything else – Food Marketing Technology”

From Chris Gayle to Virat Kohli: Most runs scored by players in India vs West Indies ODI series

Infertility Treatment: How Ayurveda Can Help Increase Fertility? experts tell

Ishant Sharma opens up about the truth behind Zaheer Khan’s Test retirement and the allegations against Virat Kohli

Trending Tags

From Chaos to Consistency: Docker for Data Scientists | By Igor Howell | May, 2023

Watch: Mumbai Indians pacer Akash Madhwal takes 5 wickets in Eliminator, Anil Kumble welcomes him to ‘special’ club

Arshad Warsi’s thriller series ‘Asur’ is back with Season 2, to release on 1st June

admin

Arshad Warsi's thriller series 'Asur' is back with Season 2, to release on 1st June

Leave a Reply Cancel reply

Browse by Category

Recent News

Awas Outflow ETF dan Stablecoin! 3 Isu Regulasi Global yang Paling Menekan Pasar Kripto Saat Ini

Sinyal Pemulihan: Mengenali Fading Bearish Momentum dan Level Kunci $92.000 untuk Reversal Bitcoin