How to Adjust Pod Resources for Suspended Kubernetes Jobs (v1.36+)

By ● min read

Introduction

In Kubernetes v1.36, a new beta feature allows you to modify CPU, memory, GPU, and extended resource requests and limits on a suspended Job. This is a game-changer for batch and machine learning workloads where resource requirements often depend on real-time cluster capacity and queue priorities. Previously, you'd have to delete and recreate a Job to change its resource spec, losing metadata and history. Now, you can adjust resources while the Job is paused and then resume it — without starting from scratch.

How to Adjust Pod Resources for Suspended Kubernetes Jobs (v1.36+)

This step-by-step guide will walk you through using this feature manually or with a queue controller like Kueue.

What You Need

Step-by-Step Instructions

Step 1: Create or Identify a Suspended Job

If you don’t already have a suspended Job, create one that requests specific resources. The key is to set spec.suspend: true in the Job manifest. Below is an example of a machine learning training Job asking for 4 GPUs, 8 CPUs, and 32 GiB of memory:

apiVersion: batch/v1
kind: Job
metadata:
  name: training-job-example-abcd123
spec:
  suspend: true
  template:
    spec:
      containers:
      - name: trainer
        image: example-registry.example.com/training:2026-04-23T150405.678
        resources:
          requests:
            cpu: "8"
            memory: "32Gi"
            example-hardware-vendor.com/gpu: "4"
          limits:
            cpu: "8"
            memory: "32Gi"
            example-hardware-vendor.com/gpu: "4"
      restartPolicy: Never

Apply this manifest with kubectl apply -f job.yaml.

Step 2: Confirm the Job Is Suspended

Run the following command to verify that the Job is in a suspended state:

kubectl get job training-job-example-abcd123 -o jsonpath='{.spec.suspend}'

It should output true. You can also list all Jobs with kubectl get jobs and look for a Status of 0/1 completed tasks.

Step 3: Modify the Resource Requests and Limits

While the Job is suspended, you can change its pod template resource fields. For example, if the cluster only has 2 GPUs available, adjust the requests and limits accordingly. Use kubectl patch, kubectl edit, or a direct update via API. Here’s how to patch the resource fields:

kubectl patch job training-job-example-abcd123 --type='merge' -p='{"spec":{"template":{"spec":{"containers":[{"name":"trainer","resources":{"requests":{"cpu":"4","memory":"16Gi","example-hardware-vendor.com/gpu":"2"},"limits":{"cpu":"4","memory":"16Gi","example-hardware-vendor.com/gpu":"2"}}}]}}}}'

This updates the Job’s pod template. Because the Job is suspended, this modification is allowed (the immutability constraint is relaxed).

Step 4: Verify the Changes

Check that the resources have been updated correctly:

kubectl get job training-job-example-abcd123 -o yaml

Look under spec.template.spec.containers[0].resources — they should now show the adjusted values. No new Pods are created yet because the Job is still suspended.

Step 5: Resume the Job

Once you’re satisfied with the resource settings, unsuspend the Job by setting spec.suspend to false:

kubectl patch job training-job-example-abcd123 --type='merge' -p='{"spec":{"suspend":false}}'

Kubernetes will now launch the Pods using the updated resource specifications. You can monitor progress with:

kubectl get pods -l job-name=training-job-example-abcd123

Step 6: Confirm Pod Resources

After the Job resumes, inspect one of the running Pods to ensure the resources are applied:

kubectl get pod  -o jsonpath='{.spec.containers[0].resources}'

The output should match the new values you set in Step 3. If everything looks good, you’ve successfully adjusted resources for a suspended Job.

Tips and Best Practices

By following these steps, you can flexibly adjust resource allocations for batch and ML jobs without losing metadata, history, or requiring deletions. This feature streamlines workload scheduling in dynamic cluster environments.

Tags:

Recommended

Discover More

Scaling Human Teams: A Practical Guide to Overcoming Communication BottlenecksMastering Automated Testing: A Guide to Python's unittest ModuleGCC 16.1 Delivers Major C++20 Defaults and Pioneering C++26 FeaturesKazakhstan Strengthens Higher Education with Renewed Coursera Partnership: AI, Credit Courses, and Kazakh Language Expansion10 Things You Need to Know About the Supreme Court’s Voting Rights Act Ruling