CVE-2026-22773
published 2026-01-10CVE-2026-22773: vLLM is an inference and serving engine for large language models (LLMs). In versions from 0.6.4 to before 0.12.0, users can crash the vLLM engine serving…
PriorityP343high7.5CVSS 3.1
AVNACLPRNUINSUCNINAH
EPSS
0.40%
32.2th percentile
vLLM is an inference and serving engine for large language models (LLMs). In versions from 0.6.4 to before 0.12.0, users can crash the vLLM engine serving multimodal models that use the Idefics3 vision model implementation by sending a specially crafted 1x1 pixel image. This causes a tensor dimension mismatch that results in an unhandled runtime error, leading to complete server termination. This issue has been patched in version 0.12.0.
Affected
3 ranges
| Vendor | Product | Version range | Fixed in |
|---|---|---|---|
| vllm-project | vllm | — | — |
| vllm | vllm | >= 0.6.4 < 0.12.0 | 0.12.0 |
| vllm | vllm | >= 0.6.4 < 0.12.0 | 0.12.0 |
CVSS provenance
nvdv3.17.5HIGHCVSS:3.1/AV:N/AC:L/PR:N/UI:N/S:U/C:N/I:N/A:H
vendor_redhat6.5MEDIUM
Stop checking back — get the weekly exploitation signal.
Every Monday: what got weaponized or added to CISA KEV in the last seven days — each CVE cross-linked to its PoC, Nuclei template, and detection rule. Free, one email a week, unsubscribe in one click.
Red Hat
vllm: vLLM: Denial of Service via specially crafted image in multimodal model serving
vendor_redhat·2026-01-10·CVSS 6.5
CVE-2026-22773 [MEDIUM] CWE-770 vllm: vLLM: Denial of Service via specially crafted image in multimodal model serving
vllm: vLLM: Denial of Service via specially crafted image in multimodal model serving
vLLM is an inference and serving engine for large language models (LLMs). In versions from 0.6.4 to before 0.12.0, users can crash the vLLM engine serving multimodal models that use the Idefics3 vision model implementation by sending a specially crafted 1x1 pixel image. This causes a tensor dimension mismatch that results in an unhandled runtime error, leading to complete server termination. This issue has been patched in version 0.12.0.
A flaw was found in vLLM, an inference and serving engine for large language models (LLMs). A remote attacker can exploit this vulnerability by sending a specially crafted 1x1 pixel image to a vLLM engine serving multimodal models that use the Idefics3 vision model impl
OSV
vLLM is vulnerable to DoS in Idefics3 vision models via image payload with ambiguous dimensions
osv·2026-01-13
CVE-2026-22773 [MEDIUM] vLLM is vulnerable to DoS in Idefics3 vision models via image payload with ambiguous dimensions
vLLM is vulnerable to DoS in Idefics3 vision models via image payload with ambiguous dimensions
### Summary
Users can crash the vLLM engine serving multimodal models that use the _Idefics3_ vision model implementation by sending a specially crafted 1x1 pixel image. This causes a tensor dimension mismatch that results in an unhandled runtime error, leading to complete server termination.
### Details
The vulnerability is triggered when the image processor encounters a 1x1 pixel image with shape (1, 1, 3) in HWC (Height, Width, Channel) format. Due to the ambiguous dimensions, the processor incorrectly assumes the image is in CHW (Channel, Height, Width) format with shape (3, H, W). This misinterpretation causes an incorrect calculation of the number of image patches, resulting in a fatal t
GHSA
vLLM is vulnerable to DoS in Idefics3 vision models via image payload with ambiguous dimensions
ghsa·2026-01-13
CVE-2026-22773 [MEDIUM] CWE-770 vLLM is vulnerable to DoS in Idefics3 vision models via image payload with ambiguous dimensions
vLLM is vulnerable to DoS in Idefics3 vision models via image payload with ambiguous dimensions
### Summary
Users can crash the vLLM engine serving multimodal models that use the _Idefics3_ vision model implementation by sending a specially crafted 1x1 pixel image. This causes a tensor dimension mismatch that results in an unhandled runtime error, leading to complete server termination.
### Details
The vulnerability is triggered when the image processor encounters a 1x1 pixel image with shape (1, 1, 3) in HWC (Height, Width, Channel) format. Due to the ambiguous dimensions, the processor incorrectly assumes the image is in CHW (Channel, Height, Width) format with shape (3, H, W). This misinterpretation causes an incorrect calculation of the number of image patches, resulting in a fatal t
No detection rules found.
No public exploits indexed.
2026-01-10
Published