CVE-2026-44223
published 2026-05-12CVE-2026-44223: vLLM is an inference and serving engine for large language models (LLMs). From 0.18.0 to before 0.20.0, the extract_hidden_states speculative decoding proposer…
PriorityP336medium6.5CVSS 3.1
AVNACLPRLUINSUCNINAH
EPSS
0.37%
28.5th percentile
vLLM is an inference and serving engine for large language models (LLMs). From 0.18.0 to before 0.20.0, the extract_hidden_states speculative decoding proposer in vLLM returns a tensor with an incorrect shape after the first decode step, causing a RuntimeError that crashes the EngineCore process. The crash is triggered when any request in the batch uses sampling penalty parameters (repetition_penalty, frequency_penalty, or presence_penalty). A single request with a penalty parameter (e.g., "repetition_penalty": 1.1) is sufficient to crash the server. This vulnerability is fixed in 0.20.0.
Affected
3 ranges
| Vendor | Product | Version range | Fixed in |
|---|---|---|---|
| vllm-project | vllm | — | — |
| vllm | vllm | >= 0.18.0 < 0.20.0 | 0.20.0 |
| vllm | vllm | >= 0.18.0 < 0.20.0 | 0.20.0 |
Stop checking back — get the weekly exploitation signal.
Every Monday: what got weaponized or added to CISA KEV in the last seven days — each CVE cross-linked to its PoC, Nuclei template, and detection rule. Free, one email a week, unsubscribe in one click.
VulDB
vllm-project vllm up to 0.19.x extract_hidden_states buffer size (GHSA-83vm-p52w-f9pw)
vuldb·2026-05-12
CVE-2026-44223 [LOW] vllm-project vllm up to 0.19.x extract_hidden_states buffer size (GHSA-83vm-p52w-f9pw)
A vulnerability was found in vllm-project vllm up to 0.19.x. It has been rated as problematic. This impacts the function extract_hidden_states. Performing a manipulation of the argument repetition_penalty/frequency_penalty/presence_penalty results in incorrect calculation of buffer size.
This vulnerability is known as CVE-2026-44223. Remote exploitation of the attack is possible. No exploit is available.
Upgrading the affected component is advised.
GHSA
vLLM: extract_hidden_states speculative decoding crashes server on any request with penalty parameters
ghsa·2026-05-06
CVE-2026-44223 [MEDIUM] CWE-131 vLLM: extract_hidden_states speculative decoding crashes server on any request with penalty parameters
vLLM: extract_hidden_states speculative decoding crashes server on any request with penalty parameters
### Summary
The `extract_hidden_states` speculative decoding proposer in vLLM returns a tensor with an incorrect shape after the first decode step, causing a `RuntimeError` that crashes the EngineCore process. The crash is triggered when any request in the batch uses sampling penalty parameters (`repetition_penalty`, `frequency_penalty`, or `presence_penalty`).
A single request with a penalty parameter (e.g., `"repetition_penalty": 1.1`) is sufficient to crash the server. The crash is deterministic and immediate — no concurrency, race condition, or special workload is required.
### Details
In vLLM v0.17.0, the `extract_hidden_states` proposer's `propose()` method returned `sampled_to
No detection rules found.
No public exploits indexed.
No writeups or analysis indexed.
2026-05-12
Published