# GPU acceleration

This document provides a complete instruction on using GPU acceleration with Concrete ML.

Concrete ML support compiling both built-in and custom models using a CUDA-accelerated backend. However, once\
a model is compiled for CUDA, executing it on a non-CUDA-enabled machine will raise an error.

## Support

| Feature     | Built-in models | Deep NNs and LLMs | Deployment | DataFrame |
| ----------- | --------------- | ----------------- | ---------- | --------- |
| GPU support | ✅               | ✅                 | ✅          | ❌         |
|             |                 |                   |            |           |

{% hint style="warning" %}
When compiling a model for GPU, the model is assigned GPU-specific crypto-system parameters. These parameters are more constrained than the CPU-specific ones.\
As a result, the Concrete compiler may have difficulty finding suitable GPU-compatible crypto-parameters for some models, leading to a `NoParametersFound` error.
{% endhint %}

## Performance

On high-end GPUs like V100, A100, or H100, the performance gains range from 1x to 10x compared to a desktop CPU.

When compared to a high-end server CPUs(64-core or 96-core), the speed-up is typically around 1x to 3x.

On consumer grade GPUs such as GTX40xx or GTX30xx, there may be\
little speedup or even a slowdown compared to execution\
on a desktop CPU.

## Prerequisites

### Built-in models and deep NNs

This section pertains to models that are compiled using the `sklearn`-style built-in model classes or that are compiled using `compile_torch_model` or `compile_brevitas_qat_model`.

To use the CUDA-enabled backend, install the GPU-enabled Concrete compiler:

```bash
pip install --extra-index-url https://pypi.zama.ai/gpu concrete-python
```

If you already have an existing version of `concrete-python` installed, it will not be re-installed automatically. In that case, manually uninstall the current version and then install the GPU-enabled version:

```bash
pip uninstall concrete-python
pip install --extra-index-url https://pypi.zama.ai/gpu concrete-python
```

To switch back to the CPU-only version of the compiler, change the index-url to the CPU-only repository or remove the index-url parameter:

```bash
pip uninstall concrete-python
pip install --extra-index-url https://pypi.zama.ai/cpu concrete-python
```

## Checking GPU can be enabled

To check if the CUDA acceleration is available, use the following helper functions from `concrete-python`:

```python
import concrete.compiler; 
print("GPU enabled: ", concrete.compiler.check_gpu_enabled())
print("GPU available: ", concrete.compiler.check_gpu_available())
```

## Usage

To compile a model for CUDA, simply supply the `device='cuda'` argument to its compilation function:

* For built-in models, use `.compile` function.
* For custom models, use either`compile_torch_model` or `compile_brevitas_qat_model`.

## LLMs

This section pertains to models that are compiled with `HybridFHEModel`.

The models compiled as described in [the LLM section](/concrete-ml/llms/inference.md) will\
use GPU acceleration if a GPU is available on the machine where the models\
are executed. No specific compilation configuration is required to enable GPU\
execution for these models.


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.zama.org/concrete-ml/guides/using_gpu.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
