“RuntimeError: CUDA failed with error no CUDA-capable device is detected” when running a docker container in wsl2

I’m new to docker so forgive me if it’s a dumb question but recently, I have been trying to test faster-whisper, a reimplementation of OpenAI’s Whisper, and to test this, I used a docker container with a image from docker hub’s nvidia/cuda:11.7.1-cudnn8-runtime-ubuntu22.04 on ubuntu, wsl2. I successfully built the image but when I run it, this error shows up:

== CUDA ==

CUDA Version 11.7.1

Container image Copyright (c) 2016-2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved.

This container image and its contents are governed by the NVIDIA Deep Learning Container License.
By pulling and using the container, you accept the terms and conditions of this license:

A copy of this license is made available in this container at /NGC-DL-CONTAINER-LICENSE for your convenience.

Traceback (most recent call last):
  File "/home/user/Documents/experiment/./main.py", line 8, in <module>
    model = WhisperModel(model_size, device="cuda", compute_type="float16")
  File "/usr/local/lib/python3.10/dist-packages/faster_whisper/transcribe.py", line 128, in __init__
    self.model = ctranslate2.models.Whisper(
RuntimeError: CUDA failed with error no CUDA-capable device is detected

I’ve googled and tried a lot of possible solutions, even fully reinstalled nvidia driver, nvidia-container-toolkit, and docker but the error seems to persist.

This is the Dockerfile I used to build the image:

FROM nvidia/cuda:11.7.1-cudnn8-runtime-ubuntu22.04

RUN apt -y update && apt -y install python3.11 python3-pip


WORKDIR /home/user/Documents/experiment

COPY requirements.txt .

RUN pip install --no-cache-dir -r requirements.txt

COPY . .

CMD [ "python3", "./main.py" ]

and this is the python script, which I mostly copy-pasted from the official github page:

from faster_whisper import WhisperModel
import os
os.environ['CUDA_VISIBLE_DEVICES'] = "0"

model_size = "large-v2"

# Run on GPU with FP16
model = WhisperModel(model_size, device="cuda", compute_type="float16")

# or run on GPU with INT8
# model = WhisperModel(model_size, device="cuda", compute_type="int8_float16")
# or run on CPU with INT8
# model = WhisperModel(model_size, device="cpu", compute_type="int8")

segments, info = model.transcribe("./audio.wav", beam_size=5)

print("Detected language '%s' with probability %f" % (info.language, info.language_probability))

for segment in segments:
    print("[%.2fs -> %.2fs] %s" % (segment.start, segment.end, segment.text))

this is the command I attempted to run the container:
docker run --gpus all --runtime=nvidia -t nvidia/cuda:11.7.1-cudnn8-runtime-ubuntu22.04

and here is the nvidia-smi output by commanddocker run --gpus all --runtime=nvidia -t nvidia/cuda:11.7.1-cudnn8-runtime-ubuntu22.04 nvidia-smi:

Sun Oct  8 22:08:24 2023       
| NVIDIA-SMI 535.112                Driver Version: 537.42       CUDA Version: 11.7     |
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|   0  NVIDIA GeForce RTX 3080 Ti     On  | 00000000:01:00.0  On |                  N/A |
|  0%   39C    P8              21W / 400W |    522MiB / 12288MiB |      1%      Default |
|                                         |                      |                  N/A |
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|    0   N/A  N/A        32      G   /Xwayland                                 N/A      |

any ideas?

