Breakdown of H100s for Transformer Inferencing
This new Nvidia GPU just dropped! This post will analyse what it offers for transformer inferencing.
specs
Here's a spec table to start. The "16-bit format" refers to BFLOAT16 and FLOAT16 while "8-bit format" refers to FP8 or INT8. For INT8 they aren't actually flops, because the "fl" is for float, but I'll continue referring to them as flops because we d…

