Running Containers with Interactive Terminal Sessions in Azure
Run containerized workloads on Azure Batch — from Docker images in ACR to job execution via a custom Flask web app, all backed by Bicep IaC.
This blog is backed by GitHub Sample code: https://github.com/azurewithdanidu/docker-with-azure-batch/tree/main
Howdy Folks,
When we think about running containers in Azure, the usual suspects come to mind: AKS, Container Apps, or App Services — all great choices for long-running services like APIs, background workers, or microservices. But not all container workloads fall into that category.
What if your container is:
- Meant to run once and exit?
- Designed for batch processing, not serving traffic?
- Built to process files, run tests, or perform compute-heavy parallel tasks?
These short-lived, often stateless containers don’t need to be “always-on” or part of a service mesh. They just need compute — on-demand, scalable, and efficient.
Services like AKS or Container Apps aren’t a natural fit for these use cases. Have you ever wondered what you can do in this situation in Azure?
When most people think about containers, they picture a backend service or API running inside a container — always on, listening on port 80, scaling with load. And yes, those use cases are absolutely valid — in fact, they dominate modern container platforms like Kubernetes and Azure Container Apps.
But what’s often overlooked is that containers can be used like powerful, portable CLI tools.
Some scenarios like:
PyTorch Docker Container
PyTorch, a leading machine learning framework, also has an official Docker image that allows you to run models using PyTorch’s dynamic computation graph.
- Docker Hub:
pytorch/pytorch - Command:
docker run -it pytorch/pytorch bash - Description: This image comes with all necessary PyTorch libraries, and you can use it to train and deploy models in your own environment.
Jupyter Notebooks with AI Libraries
Jupyter notebooks are an essential tool for data scientists and AI practitioners. You can find pre-built Docker containers that come with Jupyter and many machine learning libraries like TensorFlow, PyTorch, Scikit-learn, and others.
- Docker Hub:
jupyter/tensorflow-notebook - Command:
docker run -p 8888:8888 jupyter/tensorflow-notebook - Description: This container includes Jupyter Notebook with TensorFlow pre-installed. It’s great for experimenting with AI models interactively in a notebook environment.
So, having said that — what is one of the best solutions in Azure?
Azure Batch is a cloud-based service that allows you to run large-scale parallel and high-performance computing tasks. It automatically handles job scheduling, resource provisioning, and scaling, making it ideal for scenarios like data processing, image analysis, and AI model training. Azure Batch supports containerized workloads, offering a flexible and cost-effective solution for batch processing in the cloud.
Here is more about Azure Batch — https://learn.microsoft.com/en-us/azure/batch/batch-technical-overview
Solution
So, what is the solution overview?
I have a Docker image, which I will be publishing to my Azure Container Registry. Azure Batch will simply pull the image from ACR and run it as a Docker container when it gets a request from my front-end web app.
What happens inside Azure Batch is as follows:
- You need to create an Azure Batch Account
- You need to create a compute pool within the Azure Batch account where you get multiple options, including running those compute pools inside your network, running as Docker images, etc.
- You need to create a job inside the compute pool
- You need to create a task inside the job with commands or tasks that you need to execute against your Docker container
And one thing I realized is — you cannot create Jobs or Tasks from IaC. You have to use either the Graph API or Azure SDK to create them once you have the pool configured.
Resource Group Overview
Here is what I ended up with in my resource group:
Azure Batch Pool Configuration
Here is the configuration inside the Azure Batch pool:
I have a user-assigned managed identity that is assigned to the pool that has permission to pull container images from the ACR.
Creating a Pool — Step by Step
Here are the steps to create a pool manually:
Go to Pool and click Create and select the options as per below. It’s important to select the highlighted option as we need a VM image that has Docker already configured — this is the only image that’s available with the required configuration as of this moment.
Make sure to select the user-assigned managed identity when creating the pool. This is how the pool nodes authenticate to ACR to pull your container images.
Next step is selecting the container image and the container registry. When specifying the container name, make sure to have the full name including the ACR domain name as shown below.
Also, when selecting the container registry DO NOT USE the username/password option — simply select the managed identity and type the ACR login server name.
Next is to select the VM SKU and the dedicated VM count. You can keep the other settings as default, but if you want to change them you can configure however you want.
The
microsoft-dsvmpublisher withubuntu-hpcoffer (SKU2204) comes pre-installed with a Docker-compatible Moby container runtime. This is the recommended image for container workloads on Azure Batch. See: Supported VM images
IaC with Bicep
While the portal walkthrough above is useful for understanding, in production you want all of this deployed via Infrastructure as Code. I’ve created Bicep templates that deploy the Batch account and pool with container configuration.
Here’s the key part of the Bicep deployment — the container configuration in the pool:
1
2
3
4
5
6
7
8
9
10
11
12
containerConfiguration: {
type: 'DockerCompatible'
containerImageNames: containerImageNames
containerRegistries: [
{
registryServer: containerRegistryServer
identityReference: {
resourceId: managedIdentityResourceId
}
}
]
}
This does three important things:
- Sets the container runtime to
DockerCompatible— enables the Docker-compatible runtime on pool nodes. - Prefetches container images — images are pulled to each node when the pool is created, not when tasks start. This eliminates startup delays.
- Authenticates to ACR via managed identity — using
identityReferenceinstead of username/password. No secrets to rotate.
To deploy:
1
2
3
4
5
6
7
8
9
10
11
12
13
# Clone the repository
git clone https://github.com/azurewithdanidu/docker-with-azure-batch.git
cd docker-with-azure-batch
# Deploy the infrastructure
az deployment group create \
--resource-group your-resource-group \
--template-file bicep/ml-batch/main.bicep \
--parameters \
location=eastus \
batchAccountName=your-batch-account \
managedIdentityResourceId="/subscriptions/<sub-id>/resourceGroups/<rg>/providers/Microsoft.ManagedIdentity/userAssignedIdentities/<identity-name>" \
containerRegistryServer=your-registry.azurecr.io
Make sure the user-assigned managed identity has the Batch Contributor role on the Batch account and the AcrPull role on your Azure Container Registry before deploying.
The Front-End Web App
The next step is to create the Job and Task to execute. We can do it in the GUI, but instead, I wrote a small Python web app to interact with Azure Batch using its access key so I can execute commands from a web layer.
The idea is I create the job from the web portal and also submit tasks. As you can see, the Task form requests 4 basic inputs:
- Job ID — to run this task under
- Task Name — a unique identifier for the task
- Container Image name — to execute the command against. Don’t get confused — the reason we selected the container details in the Pool is to cache the container image so that we can execute things faster
- Command to execute — the actual command to run inside the container
Note: In the screenshots I’m not executing proper commands for the container I’m using, hence getting execution failures :P
Here’s how the task creation works in code:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
def create_task(batch_client, job_id, task_id, command, image_name):
"""Create a container task."""
task_container_settings = batchmodels.TaskContainerSettings(
image_name=image_name,
working_directory="taskWorkingDirectory"
)
user_identity = batchmodels.UserIdentity(
auto_user=batchmodels.AutoUserSpecification(
scope="pool",
elevation_level="nonadmin"
)
)
constraints = batchmodels.TaskConstraints(
retention_time=datetime.timedelta(days=7),
max_wall_clock_time=datetime.timedelta(hours=72),
max_task_retry_count=0
)
task = batchmodels.TaskAddParameter(
id=task_id,
command_line=command,
container_settings=task_container_settings,
user_identity=user_identity,
constraints=constraints
)
return batch_client.task.add(job_id=job_id, task=task)
When Batch creates a container task, it uses
docker createunder the hood. Thecommand_lineyou specify becomes theCMDfor the container, while the image’sENTRYPOINTremains unchanged. See: Container task command line
Seeing It All Come Together
Azure Batch makes it easy to run large-scale parallel and high-performance computing jobs in the cloud. In this example:
- Created a pool of compute nodes
- Submitted a new task (
task3) using a containerized image - Observed its state update in the Azure portal — going from active to completed
Perfect for automating batch jobs like ML inference or image processing!
Conclusion
When you have short-lived, compute-heavy container workloads that don’t need to be “always-on,” Azure Batch is an ideal solution. It offers a powerful and scalable way to run containerized jobs — from model training and data processing to automated testing — without the overhead of managing container orchestration platforms like AKS or Container Apps.
In my solution, I leveraged Azure Batch to pull Docker images from Azure Container Registry and run tasks on demand via a custom front-end web app. While infrastructure like pools and identities can be provisioned using IaC tools, job and task creation is best handled dynamically via SDKs or the Batch API — just as I’ve done using Python.
👉 I’ve published all the IaC templates and the front-end web app code in my GitHub repo if you’d like to try it out yourself:
https://github.com/azurewithdanidu/docker-with-azure-batch
Hope this will help someone in need :) Feel free to reach out if you have any questions!
Until next time…!
References
- What is Azure Batch? — Microsoft Learn
- Run container workloads on Azure Batch — Microsoft Learn
- Supported VM images for Azure Batch containers — Microsoft Learn
- Azure Batch documentation — Microsoft Learn
- Azure Bicep documentation — Microsoft Learn
- Azure Container Registry documentation — Microsoft Learn
- Full source code — GitHub







