tmesh
A benchmarking tool for LLM inference endpoints with a focus on testing KV cache offloading capabilities in TensorMesh deployments.
Overview
tmesh-cli is designed to measure performance metrics for OpenAI-compatible API endpoints. It automatically discovers your model configuration and runs continuous benchmarks to stress-test KV cache offloading buffers.
Features
- Automatic model discovery from OpenAI-compatible endpoints
- Continuous benchmarking with real-time metrics
- Support for major LLM models (Qwen, GPT-OSS, etc.)
- KV cache offloading stress testing
- Detailed performance metrics (TTFT, ITL, throughput, QPS)
Quick Start
Install tmesh:
pip install tmesh
Run a benchmark:
tmesh-cli benchmark --endpoint "http://localhost:8000" --api-key "your-api-key"
For detailed usage instructions, see the Getting Started guide.
Documentation
- Getting Started - Complete user guide with installation, usage, and troubleshooting