tmesh

A benchmarking tool for LLM inference endpoints with a focus on testing KV cache offloading capabilities in TensorMesh deployments.

Overview

tmesh-cli is designed to measure performance metrics for OpenAI-compatible API endpoints. It automatically discovers your model configuration and runs continuous benchmarks to stress-test KV cache offloading buffers.

Features

Automatic model discovery from OpenAI-compatible endpoints
Continuous benchmarking with real-time metrics
Support for major LLM models (Qwen, GPT-OSS, etc.)
KV cache offloading stress testing
Detailed performance metrics (TTFT, ITL, throughput, QPS)

Quick Start

Install tmesh:

pip install tmesh

Run a benchmark:

tmesh-cli benchmark --endpoint "http://localhost:8000" --api-key "your-api-key"

For detailed usage instructions, see the Getting Started guide.

Documentation

Getting Started - Complete user guide with installation, usage, and troubleshooting