cbcvebase.

Vllm-Project Vllm vulnerabilities

38 known vulnerabilities affecting vllm-project/vllm.

Total CVEs
38
CISA KEV
0
Public exploits
1
Exploited in wild
0
Severity breakdown
CRITICAL7HIGH15MEDIUM14LOW2

Vulnerabilities

Page 2 of 2
CVE-2025-30202P3HIGHCVSS 7.5v>= 0.5.2, < 0.8.52025-04-30
CVE-2025-30202 [HIGH] CWE-770 CVE-2025-30202: vLLM is a high-throughput and memory-efficient inference and serving engine for LLMs. Versions start vLLM is a high-throughput and memory-efficient inference and serving engine for LLMs. Versions starting from 0.5.2 and prior to 0.8.5 are vulnerable to denial of service and data exposure via ZeroMQ on multi-node vLLM deployment. In a multi-node vLLM deployment, vLLM uses ZeroMQ for some multi-node communication purposes. The primary vLLM host opens a
nvd
CVE-2025-62426P3MEDIUMCVSS 6.5v>= 0.5.5, < 0.11.12025-11-21
CVE-2025-62426 [MEDIUM] CWE-770 CVE-2025-62426: vLLM is an inference and serving engine for large language models (LLMs). From version 0.5.5 to befo vLLM is an inference and serving engine for large language models (LLMs). From version 0.5.5 to before 0.11.1, the /v1/chat/completions and /tokenize endpoints allow a chat_template_kwargs request parameter that is used in the code before it is properly validated against the chat template. With the right chat_template_kwargs parameters, it is possib
nvd
CVE-2026-34756P3MEDIUMCVSS 6.5v>= 0.1.0, < 0.19.02026-04-06
CVE-2026-34756 [MEDIUM] CWE-770 CVE-2026-34756: vLLM is an inference and serving engine for large language models (LLMs). From 0.1.0 to before 0.19. vLLM is an inference and serving engine for large language models (LLMs). From 0.1.0 to before 0.19.0, a Denial of Service vulnerability exists in the vLLM OpenAI-compatible API server. Due to the lack of an upper bound validation on the n parameter in the ChatCompletionRequest and CompletionRequest Pydantic models, an unauthenticated attacker can s
nvd
CVE-2026-34760P3HIGHCVSS 7.1v>= 0.5.5, < 0.18.02026-04-02
CVE-2026-34760 [HIGH] CWE-20 CVE-2026-34760: vLLM is an inference and serving engine for large language models (LLMs). From version 0.5.5 to befo vLLM is an inference and serving engine for large language models (LLMs). From version 0.5.5 to before version 0.18.0, Librosa defaults to using numpy.mean for mono downmixing (to_mono), while the international standard ITU-R BS.775-4 specifies a weighted downmixing algorithm. This discrepancy results in inconsistency between audio heard by humans (e.g
nvd
CVE-2026-34755P3MEDIUMCVSS 6.5v>= 0.7.0, < 0.19.02026-04-06
CVE-2026-34755 [MEDIUM] CWE-770 CVE-2026-34755: vLLM is an inference and serving engine for large language models (LLMs). From 0.7.0 to before 0.19. vLLM is an inference and serving engine for large language models (LLMs). From 0.7.0 to before 0.19.0, the VideoMediaIO.load_base64() method at vllm/multimodal/media/video.py splits video/jpeg data URLs by comma to extract individual JPEG frames, but does not enforce a frame count limit. The num_frames parameter (default: 32), which is enforced by t
nvd
CVE-2026-47155P3MEDIUMCVSS 6.5fixed in 0.22.02026-06-22
CVE-2026-47155 [MEDIUM] CWE-345 CVE-2026-47155: vLLM is an inference and serving engine for large language models (LLMs). Prior to 0.22.0, vLLM's re vLLM is an inference and serving engine for large language models (LLMs). Prior to 0.22.0, vLLM's revision pinning controls do not consistently apply to all artifacts loaded for a model. A deployment that supplies --revision or --code-revision can still load dynamic code, GGUF files, image processors, retrieval side weights, or same-repository subfo
nvd
CVE-2026-44223P3MEDIUMCVSS 6.5v>= 0.18.0, < 0.20.02026-05-12
CVE-2026-44223 [MEDIUM] CWE-131 CVE-2026-44223: vLLM is an inference and serving engine for large language models (LLMs). From 0.18.0 to before 0.20 vLLM is an inference and serving engine for large language models (LLMs). From 0.18.0 to before 0.20.0, the extract_hidden_states speculative decoding proposer in vLLM returns a tensor with an incorrect shape after the first decode step, causing a RuntimeError that crashes the EngineCore process. The crash is triggered when any request in the batch
nvd
CVE-2025-29770P3MEDIUMCVSS 6.5fixed in 0.8.02025-03-19
CVE-2025-29770 [MEDIUM] CWE-770 CVE-2025-29770: vLLM is a high-throughput and memory-efficient inference and serving engine for LLMs. The outlines l vLLM is a high-throughput and memory-efficient inference and serving engine for LLMs. The outlines library is one of the backends used by vLLM to support structured output (a.k.a. guided decoding). Outlines provides an optional cache for its compiled grammars on the local filesystem. This cache has been on by default in vLLM. Outlines is also availa
nvd
CVE-2025-62372P3MEDIUMCVSS 6.5v>= 0.5.5, < 0.11.12025-11-21
CVE-2025-62372 [MEDIUM] CWE-129 CVE-2025-62372: vLLM is an inference and serving engine for large language models (LLMs). From version 0.5.5 to befo vLLM is an inference and serving engine for large language models (LLMs). From version 0.5.5 to before 0.11.1, users can crash the vLLM engine serving multimodal models by passing multimodal embedding inputs with correct ndim but incorrect shape (e.g. hidden dimension is wrong), regardless of whether the model is intended to support such inputs (as
nvd
CVE-2025-48944P4MEDIUMCVSS 6.5v>= 0.8.0, < 0.9.02025-05-30
CVE-2025-48944 [MEDIUM] CWE-20 CVE-2025-48944: vLLM is an inference and serving engine for large language models (LLMs). In version 0.8.0 up to but vLLM is an inference and serving engine for large language models (LLMs). In version 0.8.0 up to but excluding 0.9.0, the vLLM backend used with the /v1/chat/completions OpenAPI endpoint fails to validate unexpected or malformed input in the "pattern" and "type" fields when the tools functionality is invoked. These inputs are not validated before bei
nvd
CVE-2026-54235P3MEDIUMCVSS 6.5fixed in 0.23.1rc02026-06-22
CVE-2026-54235 [MEDIUM] CWE-1287 CVE-2026-54235: vLLM is an inference and serving engine for large language models (LLMs). Prior to 0.23.1rc0, ll tem vLLM is an inference and serving engine for large language models (LLMs). Prior to 0.23.1rc0, ll temperature validation gates use comparison operators (), which silently evaluate to False for NaN and for positive Infinity in Python's IEEE 754 float semantics. Both values pass every guard and propagate to GPU sampling kernels, where they produce und
nvd
CVE-2026-54233P3MEDIUMCVSS 6.5fixed in 0.23.1rc02026-06-22
CVE-2026-54233 [MEDIUM] CWE-409 CVE-2026-54233: vLLM is an inference and serving engine for large language models (LLMs). Prior to 0.23.1rc0, vLLM's vLLM is an inference and serving engine for large language models (LLMs). Prior to 0.23.1rc0, vLLM's /v1/audio/transcriptions endpoint limits compressed upload size but not decoded PCM output. A 25MB OPUS file expands to ~14.9GB of float32 PCM at decode time. This vulnerability is fixed in 0.23.1rc0.
nvd
CVE-2026-34753P4MEDIUMCVSS 5.4v>= 0.16.0, < 0.19.02026-04-06
CVE-2026-34753 [MEDIUM] CWE-918 CVE-2026-34753: vLLM is an inference and serving engine for large language models (LLMs). From 0.16.0 to before 0.19 vLLM is an inference and serving engine for large language models (LLMs). From 0.16.0 to before 0.19.0, a server-side request forgery (SSRF) vulnerability in download_bytes_from_url allows any actor who can control batch input JSON to make the vLLM batch runner issue arbitrary HTTP/HTTPS requests from the server, without any URL validation or domain
nvd
CVE-2025-48942P4MEDIUMCVSS 6.5v>= 0.8.0, < 0.9.02025-05-30
CVE-2025-48942 [MEDIUM] CWE-248 CVE-2025-48942: vLLM is an inference and serving engine for large language models (LLMs). In versions 0.8.0 up to bu vLLM is an inference and serving engine for large language models (LLMs). In versions 0.8.0 up to but excluding 0.9.0, hitting the /v1/completions API with a invalid json_schema as a Guided Param kills the vllm server. This vulnerability is similar GHSA-9hcf-v7m4-6m2j/CVE-2025-48943, but for regex instead of a JSON schema. Version 0.9.0 fixes the is
nvd
CVE-2025-48887P4MEDIUMCVSS 6.5v>= 0.6.4, < 0.9.02025-05-30
CVE-2025-48887 [MEDIUM] CWE-1333 CVE-2025-48887: vLLM, an inference and serving engine for large language models (LLMs), has a Regular Expression Den vLLM, an inference and serving engine for large language models (LLMs), has a Regular Expression Denial of Service (ReDoS) vulnerability in the file `vllm/entrypoints/openai/tool_parsers/pythonic_tool_parser.py` of versions 0.6.4 up to but excluding 0.9.0. The root cause is the use of a highly complex and nested regular expression for tool call det
nvd
CVE-2026-9540P4MEDIUMCVSS 5.3v0.19.02026-05-26
CVE-2026-9540 [MEDIUM] CWE-404 CVE-2026-9540: A vulnerability was identified in vllm-project vllm 0.19.0. This issue affects some unknown processi A vulnerability was identified in vllm-project vllm 0.19.0. This issue affects some unknown processing of the component OpenAI-compatible Serving Path. Such manipulation leads to denial of service. It is possible to launch the attack remotely. The exploit is publicly available and might be used. The pull request to fix this issue awaits acceptance.
nvd
CVE-2025-25183P4LOWCVSS 2.6fixed in 0.7.22025-02-07
CVE-2025-25183 [LOW] CWE-354 CVE-2025-25183: vLLM is a high-throughput and memory-efficient inference and serving engine for LLMs. Maliciously co vLLM is a high-throughput and memory-efficient inference and serving engine for LLMs. Maliciously constructed statements can lead to hash collisions, resulting in cache reuse, which can interfere with subsequent responses and cause unintended behavior. Prefix caching makes use of Python's built-in hash() function. As of Python 3.12, the behavior of has
nvd
CVE-2025-46570P4LOWCVSS 2.6fixed in 0.9.02025-05-29
CVE-2025-46570 [LOW] CWE-208 CVE-2025-46570: vLLM is an inference and serving engine for large language models (LLMs). Prior to version 0.9.0, wh vLLM is an inference and serving engine for large language models (LLMs). Prior to version 0.9.0, when a new prompt is processed, if the PageAttention mechanism finds a matching prefix chunk, the prefill process speeds up, which is reflected in the TTFT (Time to First Token). These timing differences caused by matching chunks are significant enough to
nvd
Vllm-Project Vllm vulnerabilities | cvebase