Skip to content

tmesh

A benchmarking tool for LLM inference endpoints with a focus on testing KV cache offloading capabilities in TensorMesh deployments.

Overview

tmesh-cli is designed to measure performance metrics for OpenAI-compatible API endpoints. It automatically discovers your model configuration and runs continuous benchmarks to stress-test KV cache offloading buffers.

Features

  • Automatic model discovery from OpenAI-compatible endpoints
  • Continuous benchmarking with real-time metrics
  • Support for major LLM models (Qwen, GPT-OSS, etc.)
  • KV cache offloading stress testing
  • Detailed performance metrics (TTFT, ITL, throughput, QPS)

Quick Start

Install tmesh:

pip install tmesh

Run a benchmark:

tmesh-cli benchmark --endpoint "http://localhost:8000" --api-key "your-api-key"

For detailed usage instructions, see the Getting Started guide.

Documentation

  • Getting Started - Complete user guide with installation, usage, and troubleshooting