Creating SageMaker Studio Custom Kernel
AutoGluon-Py3.8 Example
I recently wanted to create my own kernel image for Amazon SageMaker Studio. I followed the available posts but most tend to focus on doing it using aws cli, nothing wrong with that but I wanted to simulate the same using the AWS Console functionality since for quick testing I find it just as easier.
In this post I will use what is available on the other posts and build on it to simplify
- Creating your own image and pushing it ECR using SageMaker Studio Notebook using sagemaker-cli.
- Attaching the image from 1 to SageMaker Studio as a custom image and using it in a SageMaker studio notebook.
This post assumes you have an AWS account and have all access to SageMaker and Elastic Container Registry(ECR) and ability to create IAM roles and permissions. You are familiar with SageMaker Studio and you have used it to some extent. You will be running the content of this blog in SageMaker Studio Notebook for most part and will use the AWS console for IAM and attaching the image to the domain.
Note: if you don’t have an existing SageMaker studio domain, you can create one and then follow along here.
If you need introduction to SageMaker and SageMaker Studio, Look at other AWS resources and tutorials.
Custom Docker Image
Do a quick search on dockerhub for autogluon and we will find the latest images for GPU and CPU, we will select the GPU image and create a quick docker file inside the Studio by launching a text file and renaming it as Dockerfile with following content (filename: Dockerfile)
FROM autogluon/autogluon:0.6.1-cuda11.3-jupyter-ubuntu20.04-py3.8
ARG NB_USER="sagemaker-user"
ARG NB_UID="1000"
ARG NB_GID="100"
RUN \
useradd -m -s /bin/bash -N -u $NB_UID $NB_USER && \
chmod g+w /etc/passwd && \
echo "${NB_USER} ALL=(ALL) NOPASSWD: ALL" >> /etc/sudoers && \
# Prevent apt-get cache from being persisted to this layer.
rm -rf /var/lib/apt/lists/*
RUN pip install ipykernel && \
python -m ipykernel install --sys-prefix
USER $NB_UID
I want to break this down on how I arrived at the contents of this file. I started with the very first line and iteratively added the rest by running, failing and debugging and improving.Something that is highly encouraged at AWS. I go through this process so my customers and colleagues don’t have to.
When I ran with minimal docker file as shown below, I encountered errors with GID and UID not being right. I updated those on the SageMaker Console to no avail.
FROM autogluon/autogluon:0.6.1-cuda11.3-jupyter-ubuntu20.04-py3.8
RUN pip install ipykernel && \
python -m ipykernel install --sys-prefix
I picked up the additional lines in the file by looking at the samples in this repo by AWS. https://github.com/aws-samples/sagemaker-studio-custom-image-samples
Based on the error messages on user , GID, UID, I picked up the lines from the samples that I knew could fix it.
To build docker directly from SageMaker Studio notebook, you will need a utility called sagemaker-studio-image-build. you can pip install this directly in SageMaker Studio notebook on minimal instance size and Base Python 2.0 kernel.
!pip install sagemaker-studio-image-build
Along with that install you will need a few permissions to let SageMaker call codebuild, push images to ECR and store build artifacts in S3. Follow this blog until “USING THE CLI” section and come back.
Once you have the necessary permissions attached per the above blog
Building Docker Image
You can build this anywhere you want as long as you can push it to Elastic Container Registry(ECR). Some ideas include, your laptop or EC2 if you have docker installed. This tutorial to minimize foot print will run on SageMaker Studio Notebook. The requirements below at any point feel like too much, just build your image elsewhere push to ECR and skip to next section.
Building and pushing docker image from SageMaker Studio Notebook
Now we will build and push this docker image to ECR. In the same studio notebook where you installed sagemaker-studio-image-build library above, run the following line from the directory with Dockerfile.
sm-docker build .
that command will build the image using codebuild and then create an Elastic Container Registry (ECR) and upload the docker binaries to it and provide you in the output the image address which you can copy paste and use in the next step.
Example Output: xxxxxxxxx.dkr.ecr.us-east-1.amazonaws.com/sagemaker-studio-d-abeiuguig:raj
This output will be available even if your docker build or push fails, so ensure there are no error messages or exit codes in the output, succeeded all the way through.
Error checking and logging
Check cloudwatch logs and your notebook output for any errors. One thing I did and struggled for a bit was unintentionally naming my Dockerfile as DockerFile.
Attach the image to Amazon SageMaker Studio as custom image.
Navigate to AWS Console and type in SageMaker in the search bar and navigate to SageMaker Console.
Fill out the next section as shown below, change the display names as you see fit. Also make sure image name and Kernel name match ( learned from errors).
Expand the Advanced Configuration section and ensure it looks like below, these are default values.
Click Submit
you should be returned to your “attach image” screen, you can check if the image is successfully attached by going to Environment Section.
You will see one of attaching image, an error in red ribbon or if successful like below with image attached screen.
You can’t celebrate yet as this means, your image is attached to the SageMaker domain and doesn’t necessarily mean that it will launch successfully.
Close the existing SageMaker Studio browser, clear any cache and launch SageMaker Studio again from the user screen as shown below or from studio tab on the console.
you will be taken to the studio
From Studio, you can open a new notebook and select an instance size and the look for your image in custom image section. hover over your custom image, mine is Python3 and click on the version you want, I only have v1 here, you can have multiple.
Once selected, it will look like this , I gave a display name of autogluon, so I see that in the kernel.
If Image is built right you should get a successful launch. you can test this by trying to run a import statement or some code.
Conclusion
Amazon SageMaker Studio provides a number of prebuilt kernels and images for MxNet, PyTorch, Tensorflow, Base Python (3.7, 3.8 and 3.10) and Data Science etc. If none of these work for you or you have a specific set of libraries built on a specific OS or you need something that is not available on the SageMaker Studio, This tutorial will come in handy. It will take a little bit know how on provisioning permissions, checking logs and having access to an account with sufficient user permissions to get this done. Provided you have all the requirements met, it is an easy to intermediate level effort to get this done on SageMaker Studio as a one time effort. Check the aws sample link if you need more examples or more ideas on how to build your docker image.