# Changelog

## TFHE-rs Releases

***

{% updates format="full" %}
{% update date="2026-04-22" tags="cpu,gpu" %}

## TFHE-rs v1.6

TFHE-rs v1.6.0 adds several new features such as: re-randomization without keyswitch (both on CPU and GPU), improved performance for several operations as well as a new experimental GPU backend for ZKs.

### CPU

#### New Features

* Add rerand without keyswitch for improved performance
* Add seeded (Proven)CompactCiphertextList encryption
* Add compact list re-randomization APIs
* Expose contains APIs on FheUint and FheInt
* Add fused mul div entry points
* Add is\_conformant for CompressedXofKeySet

#### Improvements

* Improve compact public key performance: down to 455µs from 3.8ms in a typical 2xFheUint64 rerand
* Improved leading\_zeroes performance (x2 on typical FheUint64 case)
* Use a dedicated key for oprf

#### Fixes

* Fixed CompactCiphertextList expansion when parameters are KS32
* Fixed an edge case crash of ProvenCompactCiphertextList expand<br>

### GPU

#### New Features

* Add rerand without keyswitch for improved performance in CUDA backend
* Add trivium and kreyvium for transciphering
* Add specialized noise squash for H100 with classical parameters
* Add zk backend to enable verifying on GPUs
* Add specialized pbs to target old GPU models

#### Improvements

* Improved 128-bit FFT performance and f128 precision
* Improved noise squash latency with classical parameters on H100 from 120ms to 43ms
* Improved noise squash latency with multi-bit parameters on H100 from 56ms to 24ms&#x20;
* Improved verification with the new zk backend from 49ms to 32ms
* Improved rerand on FheUint64 from 730us to 390us.

#### Fixes

* Fixed noise handling when batching AES on GPU.
* Fixed memory access on expand when the number of LWEs is odd
* Fixed compression glwe accesses when the number of lwes per glwe != polynomial size
* Fixed memory leak when destroying events on multi-gpu execution
* Fixed race condition on programmable bootstrap flavors that run on GPUs with compute capability < 90
* Fixed multi-threaded race conditions when creating parallel mempools
* Fixed missing sync when copying a compact list from gpu to cpu.
* Fixed corner cases in memory handling of decompression.

#### Resources

* [GitHub release](https://github.com/zama-ai/tfhe-rs/releases/tag/tfhe-rs-1.6.0)
* [Documentation](https://docs.zama.ai/tfhe-rs/1.6)<br>
  {% endupdate %}

{% update date="2026-01-20" tags="cpu,gpu,hpu" %}

## TFHE-rs v1.5

TFHE-rs v1.5.0 adds new features, performance improvements, and fixes across backends.

#### Highlights

* **CPU:** Friendlier parameter selection APIs, Multi-Bit decompression support, wider OPRF ranges, and 42% faster ZK verification in the typical case
* **GPU:** 2.4x speedup on H100 for classical PBS with `MESSAGE_2_CARRY_2` and 10x faster `match_value` on 256-value lists
* **HPU:** New DOp firmware for shift and rotation with almost 70% lower latency

### CPU

#### New features

* OPRF now supports uniform random values over any 64-bit ranges
* Decompression now supports Multi-Bit blind rotation
* Added `MetaParameterFinder` for easier parameter selection
* `CompressedXofKeySet` can now be generated with its paired `ClientKey`
* Added the ability to recreate an `LweCiphertext` from a modswitched `LweCiphertext`
* Added blind rotation using Karatsuba multiplication for easier implementation checks

#### Improvements

* ZK verify latency improved by 42% for 256 bits of encrypted data with a 2048-bit CRS

#### Fixes

* Fixed `CompactCiphertextList` conformance crash
* Fixed `StaticUnsignedBigInt` cast into `u128`
* Fixed an edge case in the decomposition algorithm
* Fixed `BorrowMut` errors with thread-local `ShortintEngine`
* Added the missing compressed proof version for the ZK crate
* Fixed tag propagation in `XofKeySet`
* Fixed `par_encrypt_and_prove` using sequential encryption
* Fixed JS API handling for undefined variants with `Option<>`

### GPU

#### New features

* Added support for encrypted AES-256
* Implemented a GEMM-based keyswitch with better throughput and used it in AES
* Added support for the re-randomization technique
* Added support for custom powers-of-two ranges for OPRF in the integer API

#### Improvements

* Added `1_1` classical PBS parameters for the specialized version
* Extended the specialized version to classical PBS
* Set a specific threshold for multi-GPU with classical PBS
* Added `InternalCudaStreams` to improve internal stream management
* Moved `vector_comparison` functions to the backend for better performance
* Moved `cast_to_signed` to the CUDA backend
* Moved `unchecked_index_of_clear` to the CUDA backend
* Moved `vector_find` functions to the CUDA backend
* Moved `unchecked_match_value_or` to the CUDA backend
* Moved `match_value` to the CUDA backend
* Removed all `_async` functions from the integer API

#### Fixes

* Returned to 64 registers in multi-bit PBS
* Fixed CPU memory leak in expand and rerand
* Fixed GPU memory leak in rerand
* Fixed CPU memory leaks in several integer operations
* Fixed GPU memory leak in decompression
* Used only thread-block-clusters for classical 64-bit PBS on H100
* Added missing sync before free in OPRF
* Fixed full propagate so it stays on a single GPU
* Forced `uint64` when calculating LWE chunk size to avoid overflow
* Fixed PBS128 selection for small numbers of LWEs
* Fixed decomposition algorithm mismatch with theory
* Added an upper bound to LWE chunk size calculation
* Fixed `are_all_comparison_blocks_true` when the number of blocks is 0

### HPU

#### Improvements

* Added HPU v2.2 improvements with 2x ALU bandwidth and improved key caches
* Faster shift and rotation operations
* HPU now uses an interrupt for IOp Ack from Instruction Scheduler to RPU

#### Fixes

* Fixed the HPU accumulator memory arbiter

#### Resources

* [GitHub release](https://github.com/zama-ai/tfhe-rs/releases/tag/tfhe-rs-1.5.0)
* [Documentation](https://docs.zama.ai/tfhe-rs/1.5)
  {% endupdate %}

{% update date="2025-10-20" tags="cpu,gpu,hpu" %}

## TFHE-rs v1.4

TFHE-rs v1.4.1 improves performance, adds new cryptographic capabilities, and enhances hardware support across CPU, GPU, and HPU backends.

### CPU <a href="#cpu" id="cpu"></a>

#### Highlights <a href="#highlights" id="highlights"></a>

The CPU backend introduces new APIs for additional security guarantees, extended atomic pattern support, and new encrypted data handling capabilities：

* **Security** — Introduces the \`ReRand\` feature to ensure security under the sIND-CPAᴰ model.
* **Extended KS32 AP support** : The keyswitch 32 atomic pattern (KS32 AP) now supports compact public key encryption, keyswitching, compression, and noise squashing.
* **Performance**: KS32 AP provides a 10–19% speedup on 64-bit integer operations.
* **Encrypted data handling**: Adds KVStore to manipulate hashmaps in a blind way to update encrypted values.
* **Parameter clarity**: Parameter sets are now standardized and exposed as \`MetaParameters\`.

#### New Features <a href="#new-features" id="new-features"></a>

* Add MetaParameters
* Add multi bit PBS support to noise squashing
* Add noise squashing support for the KS32 AP
* Add ciphertext compression support for the KS32 AP
* Add compact public key encryption support for the KS32 AP
* Add quasi-uniform OPRF over any range for `tfhe::integer`
* Add KVStore for blind encrypted key-value updates
* Add flip operation
* Add ReRand primitives for sIND-CPAᴰ security
* Add XOF keyset
* Make `FheUint`/`FheInt`/`FheBool` compatible with AP params for conformance
* Add missing `safe_deser` for ServerKey in the C API

#### Improvements <a href="#improvements" id="improvements"></a>

* Improve FFT and NTT plan cache locking

#### Fixes <a href="#fixes" id="fixes"></a>

* Set correct degree for noise squashed decompressed ciphertext
* Avoid potential overflow for GLWE encryption on 32 bits platforms
* Fix NTT plan yielding incorrect results for a class of primes
* Fix scalar size check before ZK public key encryption

### GPU <a href="#gpu" id="gpu"></a>

The GPU backend receives major performance upgrades, improved PBS techniques, and new compression and benchmarking capabilities:

* **Performance**: All operations see 2× speedup on H100 GPUs, with certain primitives (multiplication, division, OPRF, ilog2, scalar division and multiplication) reaching 3–10× acceleration.
* **PBS enhancements**: A new technique called "mean reduction" replaces the previous technique "drift" for classical PBS, to keep the same cryptographic parameters without the need for an additional key.
* **Noise squashing:** Multi-bit noise squashing is introduced, providing up to 4× faster execution compared to classical PBS.
* Compression: Adds support for 128-bit compression.
* New benchmark: A new benchmark on GPU is introduced to perform AES encryption using FHE (in counter mode).
* Parameter clarity: Parameter sets are now standardized and exposed as \`MetaParameters\`.

#### New Features <a href="#new-features-1" id="new-features-1"></a>

* Add 128-bit multi-bit PBS for noise squashing
* Add 128-bit compression
* Add the centered modulus switch technique to reduce noise in the classical PBS
* FHE encryption of AES 128 in counter mode on GPU (available in the integer API)

#### Improvements <a href="#improvements-1" id="improvements-1"></a>

* Create specialized version of multi-bit pbs using thread block clusters: this results in a significant performance improvement on all operations on H100 (x2)
* Improve the multi-GPU communication scheme
* Use CUDA mempools to optimize memory reuse
* Improve division performance on nodes with 4 GPUs or more: overall division is 4x faster than in the previous release
* Improve encrypted random generation (OPRF) performance by implementing it in CUDA/C++ instead of Rust (results in 10x faster OPRF)
* Improve ilog2 performance by implementing it in CUDA/C++ instead of Rust
* Enable lut generation with preallocated CPU buffers to avoid some synchronizations with the CPU in comparisons
* Add an assert to be sure the carry part has correct size in expand
* Create message extract lut only when needed for carry propagation
* Internal refactors to enhance the C++/Rust interface (pass streams and gpu indexes in a struct, pass compression data via a struct)

#### Fixes <a href="#fixes-1" id="fixes-1"></a>

* Fix memory leak in multi-gpu calculations
* Fix pbs128 multi-gpu bug
* Fix some wrong indexes used in `cuda_set_device()`.
* Fix inconsistent types to avoid overflows
* Add missing syncs when releasing scalar ops and returning trivial radix
* Fix the decompression function signature in the CUDA backend

### HPU <a href="#hpu" id="hpu"></a>

The HPU backend improves overall latency and execution throughput:

* **Latency reduction**: Overall execution latency is reduced across all HPU operations.
* **Throughput increase**: New SIMD operations have been added, which are further enhancing the throughput of HPU on a single V80 FPGA.

#### New Features <a href="#new-features-2" id="new-features-2"></a>

* Add 400Mhz HPU v2.1 bitstream
* Add ERC20\_SIMD & ADD\_SIMD operations
* Add support of servers with multiple V80 boards (only one is used)

#### Improvements <a href="#improvements-2" id="improvements-2"></a>

* Improve latency & throughput benches (HLAPI & integer) to execute some new operations and be more stable
* Improve scheduling of MUL operation
* Reduce a bit SW latency to push IOp and receive IOp acknowledge
* In HPU v2.1 bitstream:
  * Compiled with Vivado 2025.1
  * Improved place & route (especially on reset) to reach 400Mhz
  * Increase bandwidth to load BSK & KSK
  * Improved accumulator (MMACC) structure to match PBS batch size (12)

#### Fixes <a href="#fixes-2" id="fixes-2"></a>

* Stabilize HPU IOp queue
* Fix a few operations (ilog2, trail0/1, ovf\_mul...)

### Resources <a href="#resources" id="resources"></a>

* [GitHub release](https://github.com/zama-ai/tfhe-rs/releases/tag/tfhe-rs-1.4.1)
* [Documentation](https://docs.zama.ai/tfhe-rs/1.4)
  {% endupdate %}

{% update date="2025-07-16" tags="cpu,gpu,hpu" %}

## TFHE-rs v1.3

TFHE-rs v1.3.0 adds new features focused on performance and usability.

The HPU now supports more operations and matches CPU and GPU error probability targets.

### CPU

#### New features

* Added chunked generation for `LweKeyswitchKey`
* Added multi-bit PBS for 128-bit moduli
* Added Atomic Pattern support at the `ClientKey` level
* Added `OverflowingNeg` in the high-level API
* Added compression support after noise squashing
* Added modulus-switch noise compensation and centering
* Added a different hashing mode for ZK v2 for faster verification
* Added a more granular conformance check for ZK proofs
* Added a key chain mechanism to update old ciphertext parameters

#### Improvements

* Added a new division algorithm with a 36% improvement for 64-bit division with default parameters

### GPU

#### New features

* Added GPU memory query helpers for integer, boolean, compression, decompression, and encrypted random generation operations
* Added support for GPU-accelerated expand in the high-level API
* Added support for custom multi-GPU selection
* Added squash noise in the high-level API
* Added GPU-accelerated expand support to `CompactCiphertextList`
* Added a CUDA debug target for integer tests through a Cargo feature
* Added `move_to_current_device` for booleans

#### Improvements

* Fixed degrees after `abs`
* Allowed building with both GPU and HPU features enabled
* Added indexes to modulus-switch noise reduction
* Added missing error checks after some kernels
* Fixed a linking problem on Hopper GPUs
* Fixed hardcoded message modulus usage in some operations
* Fixed degrees after `bitxor`
* Prevented `nvToolsExt` inclusion when not profiling
* Fixed degrees after scalar `bitxor`
* Fixed a race condition on expand with multi-GPU
* Fixed packing keyswitch buffer allocation on large parameter sets

#### Fixes

* Used cooperative-groups-based PBS on H100s when possible on large batches
* Optimized `sum_ciphertexts` in the CUDA backend
* Increased keyswitch occupancy to 100%

### HPU

#### New features

* Added modulus-switch noise reduction with centered binary
* Updated the HPU parameter set to reach a `2^-128` probability of failure
* Added support for most previously missing operations, including division, max and min, shift, rotation, and leading and trailing zeros and ones
* Simplified and accelerated FPGA loading through PCIe

#### Resources

* [GitHub release](https://github.com/zama-ai/tfhe-rs/releases/tag/tfhe-rs-1.3.0)
* [Documentation](https://docs.zama.ai/tfhe-rs/1.3)
  {% endupdate %}

{% update date="2025-05-20" tags="cpu,gpu,hpu" %}

## TFHE-rs v1.2

TFHE-rs v1.2.0 introduces the new **HPU** backend. The HPU (Homorphic Processing Unit) is a hardware accelerator for FHE operations.

{% hint style="danger" %}
**Breaking changes**

* The shortint `ServerKey` does not directly hold the bootstrapping and keyswitch keys anymore. Instead, they are stored inside a generic `AtomicPatternServerKey` object which allows to customize the content of the key materials.
* The conformance parameters for the integer `ServerKey` are now wrapped inside `AtomicPatternParameters`.
  {% endhint %}

### CPU <a href="#cpu" id="cpu"></a>

#### New features <a href="#new-features" id="new-features"></a>

* Add back\&forth NTT implementation
* Add support for dynamic atomic pattern at the shortint level. They allow to customize how lookup tables are evaluated.
* Add the KeySwitch32 atomic pattern
* Enable custom modulus generation for TUniform
* Add AsRef implementation on ServerKey to access NoiseSquashingKey
* Run ZK verification inside dedicated thread pools to redcuce the latency

#### Fixes <a href="#fixes" id="fixes"></a>

* Fix success probability for Ternary Uniform generation
* Remove additional body coeff in multi bit ms compression
* Check that crs group element at index n is 0

### GPU <a href="#gpu" id="gpu"></a>

#### New features <a href="#new-features-1" id="new-features-1"></a>

* Implement ZK's expand
* Implement 128 bit classic CG PBS
* Add memory tracking functions for add, subtract, scalar add and scalar subtract
* Add necessary entry points for 128 bit compression
* Add circulant matrix for one vs many poly product

#### Fixes <a href="#fixes-1" id="fixes-1"></a>

* Update panic condition on upper bound for the number of cuda blocks to apply only to Thread Block Clusters
* Fix multi device execution with drift

### HPU <a href="#hpu" id="hpu"></a>

#### New features <a href="#new-features-2" id="new-features-2"></a>

* Add Hpu backend implementation

#### Resources <a href="#resources" id="resources"></a>

* [GitHub release](https://github.com/zama-ai/tfhe-rs/releases/tag/tfhe-rs-1.2.0)
* [Documentation](https://docs.zama.ai/tfhe-rs/1.2)
  {% endupdate %}

{% update date="2025-04-10" tags="cpu,gpu" %}

## TFHE-rs v1.1&#x20;

TFHE-rs v1.1.0 adds new features and improvements across the CPU and GPU backends.

{% hint style="danger" %}

#### Breaking changes

* Integer block rotations and block shift primitive directions are inverted to fix their meaning
* The NTT for the prime $$2^{64} - 2^{32} + 1$$ now uses new twiddle factors, making older NTT keys incompatible
  {% endhint %}

### CPU

#### New features

* Added scalar subtraction with the scalar as the left operand in the integer and high-level API
* Added scalar `Select` in the integer and high-level API
* Added dot product between vectors of `FheBool`
* Added trivial encrypt and decrypt support for string types
* Added chunked `LweBootstrapKey` and `SeededLweBootstrapKey` generation for memory-constrained systems
* Added a noise squashing API in the integer and high-level API
* Added the `extended-types` feature for more static typing in the high-level API
* Added GLWE keyswitch primitives

#### Improvements

* Updated the NTT for the Solinas prime $$2^{64} - 2^{32} + 1$$ to use twiddles that enable bit shifts instead of costly multiplications
* Removed `unwrap` usage in various conformance checks

#### Fixes

* Fixed a corner case where negative values were sometimes not sign-extended during encryption

### GPU

#### New features

* Implemented `fft128` in the CUDA backend
* Implemented 128-bit classic PBS

#### Improvements

* Added modulus-switch noise reduction on GPU for classical PBS
* Updated GPU cryptographic parameters to reach a `2^-128` probability of failure
* Used hex values to initialize twiddles for 64-bit FFT
* Refactored `double2` operators to use CUDA intrinsics
* Tracked degree and noise level in all integer operations in the CUDA backend
* Fixed block comparison logic with zero to match CPU behavior
* Retained LUT indexes on the CPU for each LUT application
* Added an alias for GPU compression parameters
* Detected first and last iteration of split-kernel multi-bit and classical PBS through template arguments
* Detected first and last iteration of 128-bit PBS through template arguments
* Updated integer and ERC-20 throughput benchmarks for better multi-GPU performance

#### Fixes

* Fixed the max shared memory bug for cooperative-groups PBS

#### Resources

* [GitHub release](https://github.com/zama-ai/tfhe-rs/releases/tag/tfhe-rs-1.1.0)
* [Documentation](https://docs.zama.ai/tfhe-rs/1.1)
  {% endupdate %}

{% update date="2025-02-26" tags="cpu,gpu,hpu" %}

## TFHE-rs v1.0

TFHE-rs v1.0.0 is the first official stable release of the TFHE-rs library.

It stabilizes the high-level API for the x86 CPU backend and adds classic PBS parameters with an error probability lower than `2^-128`.

{% hint style="danger" %}

#### Breaking changes

* `HlCompactable` is now required for types used in `CompactCiphertextList`
* `GpuIndex` is refactored and its internal field is no longer public
* Conformance parameter names now follow the `StructConformanceParam` naming scheme
  {% endhint %}

### CPU

#### New features

* Added a modulus-switch noise reduction technique for lower error probabilities
* Added `Abs` to the high-level C API binding
* Added a named implementation for integer compression and decompression for safe serialization
* Made strings compatible with compact and compressed lists
* Added classic PBS parameters in shortint with a probability of failure below `2^-128`

#### Improvements

* Used destructuring in more places to ensure exhaustive field checks

#### Fixes

* Fixed deserialization of old renamed structures that are still supported
* Fixed compression crash when output compute parameters were Multi Bit
* Fixed decompression of ciphertext lists after safe deserialization for various device selections
* Fixed trivial ciphertexts crashing compression because of an invalid noise check
* Fixed rotations and shifts on fewer than 2 blocks

### GPU

#### New features

* Added encrypted pseudo-random generation
* Added GPU selection in the high-level API

#### Improvements

* Optimized packing keyswitch
* `GpuIndex` now enforces validity at creation time
* Enabled more samples in the keyswitch
* Enabled more samples in PBS with the TBC variant

#### Fixes

* Fixed corner cases in the match value function
* Fixed scalar multiplication with 1 block
* Fixed internal indices for multi-GPU contexts
* Fixed several noise and degree bugs
* Fixed degree after shift and rotate
* Fixed wrong degree after decompression, which degraded performance
* Fixed compressed ciphertext list conversions between CPU and GPU

#### Resources

* [GitHub release](https://github.com/zama-ai/tfhe-rs/releases/tag/tfhe-rs-1.0.0)
* [Documentation](https://docs.zama.ai/tfhe-rs/1.0)
  {% endupdate %}
  {% endupdates %}


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.zama.org/tfhe-rs/changelog/readme.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
