bgtrees.finite_gpufields.cuda_operators package
Submodules
bgtrees.finite_gpufields.cuda_operators.check_inverse module
- bgtrees.finite_gpufields.cuda_operators.check_inverse.check_galois(x, pmod=2147483629, nmax=1000)
- bgtrees.finite_gpufields.cuda_operators.check_inverse.wrapper_inverse(x)
bgtrees.finite_gpufields.cuda_operators.dot_product module
bgtrees.finite_gpufields.cuda_operators.inverse module
bgtrees.finite_gpufields.cuda_operators.py_dotproduct module
Script to test and benchmark the dot_product kernels.
The C++ version is competitive with NumPy (probably doing the same thing under the hood) when using the @ operator. When using einsum, the C++ version is about 3 to 4 times faster.
The CUDA version is faster than NumPy (scales with the number of elements): - 1e5: 2 times faster - 1e6: 10 times faster - 1e7: 60 times faster
Note that it is an unfair comparison (for us) since in NumPy the % operation is done only at the end. The same factor of 3-4 can be multiplied to these numbers when using einsum with NumPy.
There exists some overhead in our operations that takes as long as computing ~1e4 events. It is unclear how this would scale in a situation in which there are _many_ operations. If the overhead is not per-operation (i.e., once the events are in-device they remain there), this might not be a problem.
- bgtrees.finite_gpufields.cuda_operators.py_dotproduct.check_galois(x, y, pmod=2147483629, nmax=1000)
- bgtrees.finite_gpufields.cuda_operators.py_dotproduct.fully_python_dot_product(x, y)
- bgtrees.finite_gpufields.cuda_operators.py_dotproduct.wrapper_dot_product(x, y)
- bgtrees.finite_gpufields.cuda_operators.py_dotproduct.wrapper_dot_product_single_batch(x, y)
Module contents
- bgtrees.finite_gpufields.cuda_operators.wrapper_dot_product(x, y)
- bgtrees.finite_gpufields.cuda_operators.wrapper_dot_product_single_batch(x, y)
- bgtrees.finite_gpufields.cuda_operators.wrapper_inverse(x)