Update README.md
Browse files
README.md
CHANGED
|
@@ -292,9 +292,9 @@ print(f"Peak Memory Usage: {mem:.02f} GB")
|
|
| 292 |
## Results (H100 machine)
|
| 293 |
| Benchmark | | |
|
| 294 |
|----------------------------------|----------------|---------------------------------|
|
| 295 |
-
| | Phi-4 mini-Ins | Phi-4-mini-instruct-FP8
|
| 296 |
-
| latency (batch_size=1) | 1.
|
| 297 |
-
| latency (batch_size=
|
| 298 |
| serving (num_prompts=1) | 1.35 req/s | 1.57 req/s (1.16x speedup) |
|
| 299 |
| serving (num_prompts=1000) | 66.68 req/s | 80.53 req/s (1.21x speedup) |
|
| 300 |
|
|
|
|
| 292 |
## Results (H100 machine)
|
| 293 |
| Benchmark | | |
|
| 294 |
|----------------------------------|----------------|---------------------------------|
|
| 295 |
+
| | Phi-4 mini-Ins | Phi-4-mini-instruct-FP8 |
|
| 296 |
+
| latency (batch_size=1) | 1.61s | 1.25s (1.29x speedup) |
|
| 297 |
+
| latency (batch_size=256) | 5.16s | 4.89s (1.05x speedup) |
|
| 298 |
| serving (num_prompts=1) | 1.35 req/s | 1.57 req/s (1.16x speedup) |
|
| 299 |
| serving (num_prompts=1000) | 66.68 req/s | 80.53 req/s (1.21x speedup) |
|
| 300 |
|