Programmable bootstrapping
This document details the HPU performance benchmarks of programmable bootstrapping and keyswitch operations using TFHE-rs.
All HPU benchmarks were launched on AMD Alveo v80 FPGAs.
The cryptographic parameters HPU_PARAM_MESSAGE_2_CARRY_2_KS32_PBS_TUNIFORM_2M128 were used.
The HPU interface is based on IOp (Integer Operations) execution and is not designed to execute a single PBS. That is why the following measurements were done by building custom IOp containing only PBS. The HPU executes PBS by batch to share key elements between several ciphertexts and optimize processing pipeline usage. It also executes the keyswitch in parallel of the blind-rotation of the PBS so the 2 operations cannot be separated. The next table shows the execution time of batches of 12, 9 and 2 KS-PBS.
P-fail: 2−128
It can be noticed that while maximizing size of batches does maximize PBS throughput, it does not minimize PBS latency.
Reproducing TFHE-rs benchmarks
TFHE-rs benchmarks can be easily reproduced from the source.
To get the numbers listed above we use the latency of custom operations executing 10k batches of PBS and check these values using internal trace system that measures number of clock cycles taken by each batch of PBS. The HPU internal trace system is described in HPU debug IOp documentation. The following example shows how to reproduce HPU PBS measurements:
Last updated
Was this helpful?