Documentation
¶
Overview ¶
Package llm registers `k6/x/llm`, a k6 extension for LLM-aware load testing. See the project README for the metric set and per-request semantics.
Index ¶
Constants ¶
This section is empty.
Variables ¶
This section is empty.
Functions ¶
This section is empty.
Types ¶
type Client ¶
type Client struct {
// contains filtered or unexported fields
}
Client is the JS-facing OpenAI-compatible chat client.
func (*Client) Chat ¶
Chat sends a streaming chat completion request. Returns a Promise resolving to a result object (see chatResult.toJSObject for the full shape) or rejecting with the categorized error.
JS:
client.chat({
messages: [...],
max_tokens: 256,
// control fields (peeled off before the upstream POST):
slo: { ttft_ms: 500, tpot_ms: 50, e2el_ms: 5000 },
cache_state: "cold",
tags: { region: "us-east", shape: "short" },
})
type CostModel ¶
CostModel parameterises a per-request USD cost estimate, computed from server-reported token counts:
usd = prompt_tokens * usd_per_million_input_tokens / 1e6
+ completion_tokens * usd_per_million_output_tokens / 1e6
Use hosted-API published rates directly; for self-hosted inference, compute your effective $/M-token rate offline (idle GPU $/hr divided by sustained throughput, plus marginal electricity) and plug it in here.
type Dataset ¶
type Dataset struct {
// contains filtered or unexported fields
}
Dataset is a deterministic, replayable corpus of chat requests. Loaded once per process from a JSONL file and shared across VUs via dsCache; per-Dataset instances each carry their own cursor and shuffle permutation so two VUs reading from the same file do not see the same order unless they share a (path, seed) pair.
func (*Dataset) At ¶
At returns the i-th request (modulo dataset size, negative-safe) without advancing the cursor. Use this when the caller wants to derive the index from __VU and __ITER for fully reproducible workloads.
func (*Dataset) Next ¶
Next advances the internal cursor and returns the next request, wrapping at the end. Concurrency-safe across VUs (within a single process) when the same Dataset instance is reused; in k6 each VU constructs its own instance, so "wrap" semantics apply per VU.
type EnergyModel ¶
EnergyModel parameterises a per-request energy estimate. The math, per request:
dynamic_j = prompt_tokens * j_per_input_token + completion_tokens * j_per_output_token static_j = idle_w * (duration_s) total_j = dynamic_j + static_j
Coefficients must be measured for your (GPU, model, batch regime) tuple. Under concurrent load the static term over-attributes idle power; divide idle_w by your expected per-VU concurrency for wall-plug accuracy. This is a budgeting metric, not a measurement.
func (*EnergyModel) Empty ¶
func (e *EnergyModel) Empty() bool
Empty reports whether the model would produce zero for any request.
type Options ¶
type Options struct {
BaseURL string
APIKey string
Model string
Timeout time.Duration
IgnoreEOS bool
// Headers are sent on every request. Use for custom auth schemes, gateway
// routing keys (e.g. OpenRouter "HTTP-Referer"), or observability headers.
Headers map[string]string
// DefaultSLO applies to every chat() call that doesn't supply its own.
DefaultSLO *SLOPredicate
// Energy, when set, enables per-request energy estimation. See EnergyModel.
Energy *EnergyModel
// Cost, when set, enables per-request USD estimation. See CostModel.
Cost *CostModel
}
Options configures an llm.Client.
type SLOPredicate ¶
SLOPredicate is the per-request SLO used to compute goodput and per-SLO attainment Rates. A zero field disables that SLO (it always passes).
Semantics match vLLM's `--goodput ttft:X tpot:Y e2el:Z` flag (PR #9338, shipped v0.6.4).
func (*SLOPredicate) Empty ¶
func (s *SLOPredicate) Empty() bool
Empty reports whether the predicate is functionally a no-op.

