-
Notifications
You must be signed in to change notification settings - Fork 1k
Open
Labels
Description
🐛 Bug
The model fails to train on GPU due to a CUDA capability mismatch. The installed version of PyTorch requires CUDA capability sm_70 or higher, but the available GPU (Tesla P100-PCIE-16GB) only supports sm_60. Here is traceback:
/usr/local/lib/python3.12/dist-packages/torch/cuda/__init__.py:435: UserWarning:
Found GPU0 Tesla P100-PCIE-16GB which is of cuda capability 6.0.
Minimum and Maximum cuda capability supported by this version of PyTorch is
(7.0) - (12.0)
queued_call()
/usr/local/lib/python3.12/dist-packages/torch/cuda/__init__.py:435: UserWarning:
Please install PyTorch with a following CUDA
configurations: 12.6 following instructions at
https://pytorch.org/get-started/locally/
queued_call()
/usr/local/lib/python3.12/dist-packages/torch/cuda/__init__.py:435: UserWarning:
Tesla P100-PCIE-16GB with CUDA capability sm_60 is not compatible with the current PyTorch installation.
The current PyTorch install supports CUDA capabilities sm_70 sm_75 sm_80 sm_86 sm_90 sm_100 sm_120.
If you want to use the Tesla P100-PCIE-16GB GPU with PyTorch, please check the instructions at https://pytorch.org/get-started/locally/
queued_call()
To Reproduce
Use P100 GPU on Kaggle, move Pytorch neural network model to GPU and start training loop.
Expected behavior
Successful forward training pass without errors.
Additional context
This was the error given:
AcceleratorError: CUDA error: no kernel image is available for execution on the device
Search for `cudaErrorNoKernelImageForDevice' in https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__TYPES.html for more information.
Code worked fine for me a few weeks ago, but I'm guessing a change in the Docker environment broke something.
Reactions are currently unavailable