Trying to install this for comfyui, it install with pip install llama-cpp-python and works but it is loading the gguf models to System RAM, I'm trying to make it load on GPU VRAM.
Asked chatgpt for help and it want me to download prebuilt wheel for my Python version.... and run this on comfyui venv: pip install path\to\llama_cpp_python-cu128.whl
I read from some user that to install llama-cpp-python with CUDA support need to use: CMAKE_ARGS="-DGGML_CUDA=ON -DLLAMA_LLAVA=OFF" pip install llama-cpp-python
but it gives me error: 'CMAKE_ARGS' is not recognized as an internal or external command,
operable program or batch file.
RTX 2060
Cuda Toolkit 12.8
Python 3.12