Introduction

I’ve become very weary about the risk of supply chain attacks and/or rogue “coding agents” compromising my development system. Hence, I am currently doing all my development inside a Litterbox to minimise this risk.

Since plenty of my work is in the embedded space, I’ve already had plenty of uhm “fun” figuring out how to get various things working inside a container. One challenge I remember was figuring out how to access USB devices. For this I added the “device” function to Litterbox which essentially just creates a device node inside the Litterbox’s home directory. Or similarly, I added functionality for TUN/TAP device creation so that I can run an “emulated” embedded networking stack.

Nonetheless, I have recently been playing a bit with reinforcement learning, and to this end, I had to get PyTorch working. I’ve never been a huge Nvidia fan (especially not as a Linux user) so I had to get things working on my AMD card. Thankfully, AMD has worked hard to get PyTorch working with ROCm so I knew this had to be possible. It really is not obvious at first, but it turns out that you do not need any special drivers for ROCm as everything you need already ships with newer kernels (exactly what I wanted to hear). The challenge is therefore just in userspace.

Eventually I figured out that AMD provides container images with all the ROCm stuff installed such as https://hub.docker.com/r/rocm/dev-ubuntu-24.04. Here it turned out to be important to use the “complete” variant as the smaller one seem to be missing imporant stuff. If you just want ROCm-enabled PyTorch available in your system Python, then a simpler approach is to just use the https://hub.docker.com/r/rocm/pytorch images. I prefer to use uv to manages my Python dependencies though, so that was not very useful for me.

pytorch.Dockerfile

Following is the Dockerfile I ended up with for this Litterbox (it is very close to the standard templates):

# syntax=docker/dockerfile:1.4
FROM rocm/dev-ubuntu-24.04:7.2-complete

# Setup base system (we install weston to easily get all the Wayland deps)
RUN apt-get update && \
    apt-get install -y sudo weston mesa-vulkan-drivers openssh-client git iputils-ping \
    vulkan-tools curl iproute2 rsync wget

# Install the fish shell for a nicer experience
RUN apt-get install -y fish

# Install development tools (ADAPT TO YOUR OWN NEEDS)
RUN wget https://dl.google.com/linux/direct/google-chrome-stable_current_amd64.deb && \
    apt install ./google-chrome-stable_current_amd64.deb && \
    rm google-chrome-stable_current_amd64.deb

# We put these args later to avoid excessive rebuilding
ARG USER
ARG PASSWORD

# Setup non-root user with a password for added security
RUN usermod -l $USER ubuntu -m -d /home/$USER && \
    echo "${USER}:${PASSWORD}" | chpasswd && \
    echo "${USER} ALL=(ALL) ALL" >> /etc/sudoers
WORKDIR /home/$USER

# We do not install things directly into $HOME here as they will get nuked
# once the home directory gets mounted. Instead we use a script that runs
# at start-up to construct the home directory the first time.
#
# A benefit of not installing things directly into home means that they do
# need to be re-installed when the container gets rebuilt.
RUN <<'EOF'
# Create the script using a nested heredoc
cat <<'EOT' > /prep-home.sh
#!/usr/bin/env fish

set MARKER "$HOME/.home-built"

# If the marker file already exists, exit early
if test -f "$MARKER"
    echo "Home already built; skipping."
    exec $SHELL -l
end

echo "Building home for the first time..."

# ------------------------------
# ADAPT THIS TO YOUR OWN NEEDS
# ------------------------------
curl -f https://zed.dev/install.sh | sh
curl -LsSf https://astral.sh/uv/install.sh | sh
fish_add_path -U "$HOME/.local/bin"

# Create the marker file to prevent re-running
touch "$MARKER"
echo "Done."

# Return to the normal shell
exec $SHELL -l
EOT

chmod +x /prep-home.sh
chown $USER /prep-home.sh
EOF

# Enter the fish shell by default
ENV SHELL=fish
RUN chsh -s /usr/bin/fish $USER
CMD ["fish", "/prep-home.sh"]

If you’re wondering, Chrome was installed so that I could access Tensorboard without having to expose Litterbox ports to the host. For Ubuntu Litterboxes, this seems to be the easiest way to get a working browser since everything else in the Ubuntu repos tries to install a snap which simply does not work inside a Litterbox.

pytorch.ron

Following is a summary of the settings I selected when creating the Litterbox:

(
    version: 1,
    network_mode: Pasta,
    support_ping: false,
    support_tuntap: false,
    packet_forwarding: false,
    enable_kvm: true,
    expose_pipewire: false,
    keep_groups: true,
    expose_kfd: true,
    shm_size_gb: Some(8),
)

The two important things here are expose_kfd: true and keep_groups: true. The first option simply makes the compute device available inside the Litterbox whereas the second one grants permission to access it. As per the instructions on https://rocm.docs.amd.com/projects/install-on-linux/en/latest/install/quick-start.html, this also required running the following (on the host machine):

sudo usermod -a -G render,video $LOGNAME

pyproject.toml

Finally it was time to figure out how to get the ROCm version of PyTorch installed in a venv managed by uv. It was difficult to find guidance online, but eventually I figured out the easiest solution is to use the wheels that AMD provides at https://repo.radeon.com/rocm/manylinux/. Here it is important to select the wheels that match the ROCm version of your Litterbox and the version of Python used in your project. This solution is far from ideal, but it is the best could find for now.

[project]
name = "demo"
version = "0.1.0"
description = "Add your description here"
readme = "README.md"
requires-python = ">=3.13,<3.14"
dependencies = [
    "torch",
    "triton",
    "torchvision",
    "tensorboard>=2.20.0",
    "pygame>=2.6.1",
    "matplotlib>=3.10.8",
    "pyqt5>=5.15.11",
]

[tool.uv.sources]
torch = { url = "https://repo.radeon.com/rocm/manylinux/rocm-rel-7.2/torch-2.9.1%2Brocm7.2.0.lw.git7e1940d4-cp313-cp313-linux_x86_64.whl" }
triton = { url = "https://repo.radeon.com/rocm/manylinux/rocm-rel-7.2/triton-3.5.1%2Brocm7.2.0.gita272dfa8-cp313-cp313-linux_x86_64.whl" }
torchvision = { url = "https://repo.radeon.com/rocm/manylinux/rocm-rel-7.2/torchvision-0.24.0%2Brocm7.2.0.gitb919bd0c-cp313-cp313-linux_x86_64.whl" }

Next steps

This project reminded me that I’m not the bigest fan of Python, so I’ve already moved on to trying burn as a Rusty alternative to PyTorch :P