jerryzh168 commited on
Commit
b3326ff
·
verified ·
1 Parent(s): 78936cd

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +3 -3
README.md CHANGED
@@ -292,9 +292,9 @@ print(f"Peak Memory Usage: {mem:.02f} GB")
292
  ## Results (H100 machine)
293
  | Benchmark | | |
294
  |----------------------------------|----------------|---------------------------------|
295
- | | Phi-4 mini-Ins | Phi-4-mini-instruct-FP8 |
296
- | latency (batch_size=1) | 1.64s | 1.41s (1.16x speedup) |
297
- | latency (batch_size=128) | 3.1s | 2.72s (1.14x speedup) |
298
  | serving (num_prompts=1) | 1.35 req/s | 1.57 req/s (1.16x speedup) |
299
  | serving (num_prompts=1000) | 66.68 req/s | 80.53 req/s (1.21x speedup) |
300
 
 
292
  ## Results (H100 machine)
293
  | Benchmark | | |
294
  |----------------------------------|----------------|---------------------------------|
295
+ | | Phi-4 mini-Ins | Phi-4-mini-instruct-FP8 |
296
+ | latency (batch_size=1) | 1.61s | 1.25s (1.29x speedup) |
297
+ | latency (batch_size=256) | 5.16s | 4.89s (1.05x speedup) |
298
  | serving (num_prompts=1) | 1.35 req/s | 1.57 req/s (1.16x speedup) |
299
  | serving (num_prompts=1000) | 66.68 req/s | 80.53 req/s (1.21x speedup) |
300