Vllm-Project Vllm vulnerabilities

38 known vulnerabilities affecting vllm-project/vllm.

Total CVEs

CISA KEV

Public exploits

Exploited in wild

Severity breakdown

CRITICAL7HIGH15MEDIUM14LOW2

Vulnerabilities

SortPage 1 of 2

CVE-2026-22778P2CRITICALCVSS 9.8PoCfixed in 0.23.1rc02026-02-02

CVE-2026-22778 [CRITICAL] CWE-532 CVE-2026-22778: vLLM is an inference and serving engine for large language models (LLMs). From 0.8.3 to before 0.14. vLLM is an inference and serving engine for large language models (LLMs). From 0.8.3 to before 0.14.1, when an invalid image is sent to vLLM's multimodal endpoint, PIL throws an error. vLLM returns this error to the client, leaking a heap address. With this leak, we reduce ASLR from 4 billion guesses to ~8 guesses. This vulnerability can be chaine

nvd

CVE-2026-22807P2CRITICALCVSS 9.8v>= 0.10.1, < 0.14.02026-01-21

CVE-2026-22807 [CRITICAL] CWE-94 CVE-2026-22807: vLLM is an inference and serving engine for large language models (LLMs). Starting in version 0.10.1 vLLM is an inference and serving engine for large language models (LLMs). Starting in version 0.10.1 and prior to version 0.14.0, vLLM loads Hugging Face `auto_map` dynamic modules during model resolution without gating on `trust_remote_code`, allowing attacker-controlled Python code in a model repo/path to execute at server startup. An attacker wh

nvd

CVE-2025-32444P2CRITICALCVSS 9.8v>= 0.6.5, < 0.8.52025-04-30

CVE-2025-32444 [CRITICAL] CWE-502 CVE-2025-32444: vLLM is a high-throughput and memory-efficient inference and serving engine for LLMs. Versions start vLLM is a high-throughput and memory-efficient inference and serving engine for LLMs. Versions starting from 0.6.5 and prior to 0.8.5, having vLLM integration with mooncake, are vulnerable to remote code execution due to using pickle based serialization over unsecured ZeroMQ sockets. The vulnerable sockets were set to listen on all network interfa

nvd

CVE-2026-48746P2CRITICALCVSS 9.1v>= 0.3.0, < 0.22.02026-06-22

CVE-2026-48746 [CRITICAL] CWE-444 CVE-2026-48746: vLLM is an inference and serving engine for large language models (LLMs). From 0.3.0 until 0.22.0, a vLLM is an inference and serving engine for large language models (LLMs). From 0.3.0 until 0.22.0, a vulnerability in ASGI web servers and starlette's trust on those web servers enables an authentication bypass of the OpenAI API AuthenticationMiddleware. It allows to use the API without providing the configured VLLM_API_KEY or --api-key. This vuln

nvd

CVE-2024-9053P2CRITICALCVSS 9.8v0.6.02025-03-20

CVE-2024-9053 [CRITICAL] CWE-502 CVE-2024-9053: vllm-project vllm version 0.6.0 contains a vulnerability in the AsyncEngineRPCServer() RPC server en vllm-project vllm version 0.6.0 contains a vulnerability in the AsyncEngineRPCServer() RPC server entrypoints. The core functionality run_server_loop() calls the function _make_handler_coro(), which directly uses cloudpickle.loads() on received messages without any sanitization. This can result in remote code execution by deserializing malicious pic

nvd

CVE-2026-27893P3HIGHCVSS 8.8v>= 0.10.1, < 0.18.02026-03-27

CVE-2026-27893 [HIGH] CWE-693 CVE-2026-27893: vLLM is an inference and serving engine for large language models (LLMs). Starting in version 0.10.1 vLLM is an inference and serving engine for large language models (LLMs). Starting in version 0.10.1 and prior to version 0.18.0, two model implementation files hardcode `trust_remote_code=True` when loading sub-components, bypassing the user's explicit `--trust-remote-code=False` security opt-out. This enables remote code execution via malicious mode

nvd

CVE-2025-47277P2CRITICALCVSS 9.8v>= 0.6.5, < 0.8.52025-05-20

CVE-2025-47277 [CRITICAL] CWE-502 CVE-2025-47277: vLLM, an inference and serving engine for large language models (LLMs), has an issue in versions 0.6 vLLM, an inference and serving engine for large language models (LLMs), has an issue in versions 0.6.5 through 0.8.4 that ONLY impacts environments using the `PyNcclPipe` KV cache transfer integration with the V0 engine. No other configurations are affected. vLLM supports the use of the `PyNcclPipe` class to establish a peer-to-peer communication

nvd

CVE-2025-29783P3CRITICALCVSS 9.0v>= 0.6.5, < 0.8.02025-03-19

CVE-2025-29783 [CRITICAL] CWE-502 CVE-2025-29783: vLLM is a high-throughput and memory-efficient inference and serving engine for LLMs. When vLLM is c vLLM is a high-throughput and memory-efficient inference and serving engine for LLMs. When vLLM is configured to use Mooncake, unsafe deserialization exposed directly over ZMQ/TCP on all network interfaces will allow attackers to execute remote code on distributed hosts. This is a remote code execution vulnerability impacting any deployments using

nvd

CVE-2026-54232P3HIGHCVSS 8.8fixed in 0.22.12026-06-22

CVE-2026-54232 [HIGH] CWE-427 CVE-2026-54232: vLLM is an inference and serving engine for large language models (LLMs). Prior to 0.22.1, the vLLM vLLM is an inference and serving engine for large language models (LLMs). Prior to 0.22.1, the vLLM Dockerfile is vulnerable to a dependency confusion attack through the flashinfer-jit-cache package. The package is installed from a custom index (flashinfer.ai/whl/) using --extra-index-url, but the package name was not registered on PyPI, and UV_INDEX_S

nvd

CVE-2025-59425P3HIGHCVSS 7.5fixed in 0.11.0rc22025-10-07

CVE-2025-59425 [HIGH] CWE-385 CVE-2025-59425: vLLM is an inference and serving engine for large language models (LLMs). Before version 0.11.0rc2, vLLM is an inference and serving engine for large language models (LLMs). Before version 0.11.0rc2, the API key support in vLLM performs validation using a method that was vulnerable to a timing attack. API key validation uses a string comparison that takes longer the more characters the provided API key gets correct. Data analysis across many attempts

nvd

CVE-2026-41523P3HIGHCVSS 7.5fixed in 0.22.02026-06-22

CVE-2026-41523 [HIGH] CWE-94 CVE-2026-41523: vLLM is an inference and serving engine for large language models (LLMs). Prior to 0.22.0, an assert vLLM is an inference and serving engine for large language models (LLMs). Prior to 0.22.0, an assert-based security check in vLLM's activation function loading allows any unauthenticated attacker to achieve arbitrary code execution on the server by publishing a malicious HuggingFace model, when vLLM runs in Python optimized mode (python -O or PYTHONOPT

nvd

CVE-2025-24357P3HIGHCVSS 8.8fixed in 0.7.02025-01-27

CVE-2025-24357 [HIGH] CWE-502 CVE-2025-24357: vLLM is a library for LLM inference and serving. vllm/model_executor/weight_utils.py implements hf_m vLLM is a library for LLM inference and serving. vllm/model_executor/weight_utils.py implements hf_model_weights_iterator to load the model checkpoint, which is downloaded from huggingface. It uses the torch.load function and the weights_only parameter defaults to False. When torch.load loads malicious pickle data, it will execute arbitrary code durin

nvd

CVE-2025-30165P3HIGHCVSS 8.0v>= 0.5.2, <= 0.8.5.post12025-05-06

CVE-2025-30165 [HIGH] CWE-502 CVE-2025-30165: vLLM is an inference and serving engine for large language models. In a multi-node vLLM deployment u vLLM is an inference and serving engine for large language models. In a multi-node vLLM deployment using the V0 engine, vLLM uses ZeroMQ for some multi-node communication purposes. The secondary vLLM hosts open a `SUB` ZeroMQ socket and connect to an `XPUB` socket on the primary vLLM host. When data is received on this `SUB` socket, it is deserialized

nvd

CVE-2026-44222P3HIGHCVSS 7.5v>= 0.6.1, < 0.20.02026-05-12

CVE-2026-44222 [HIGH] CWE-129 CVE-2026-44222: vLLM is an inference and serving engine for large language models (LLMs). From 0.6.1 to before 0.20. vLLM is an inference and serving engine for large language models (LLMs). From 0.6.1 to before 0.20.0, there is a a Token Injection vulnerability in vLLM’s multimodal processing. Unauthenticated, text-only prompts that spell special tokens are interpreted as control. Image and video placeholder sequences supplied without matching data cause vLLM to in

nvd

CVE-2026-24779P3HIGHCVSS 7.1v>= 0.15.1, < 0.17.02026-01-27

CVE-2026-24779 [HIGH] CWE-918 CVE-2026-24779: vLLM is an inference and serving engine for large language models (LLMs). Prior to version 0.14.1, a vLLM is an inference and serving engine for large language models (LLMs). Prior to version 0.14.1, a Server-Side Request Forgery (SSRF) vulnerability exists in the `MediaConnector` class within the vLLM project's multimodal feature set. The load_from_url and load_from_url_async methods obtain and process media from URLs provided by users, using differ

nvd

CVE-2025-48956P3HIGHCVSS 7.5v>= 0.1.0, < 0.10.1.12025-08-21

CVE-2025-48956 [HIGH] CWE-400 CVE-2025-48956: vLLM is an inference and serving engine for large language models (LLMs). From 0.1.0 to before 0.10. vLLM is an inference and serving engine for large language models (LLMs). From 0.1.0 to before 0.10.1.1, a Denial of Service (DoS) vulnerability can be triggered by sending a single HTTP GET request with an extremely large header to an HTTP endpoint. This results in server memory exhaustion, potentially leading to a crash or unresponsiveness. The atta

nvd

CVE-2026-22773P3HIGHCVSS 7.5v>= 0.6.4, < 0.12.02026-01-10

CVE-2026-22773 [HIGH] CWE-770 CVE-2026-22773: vLLM is an inference and serving engine for large language models (LLMs). In versions from 0.6.4 to vLLM is an inference and serving engine for large language models (LLMs). In versions from 0.6.4 to before 0.12.0, users can crash the vLLM engine serving multimodal models that use the Idefics3 vision model implementation by sending a specially crafted 1x1 pixel image. This causes a tensor dimension mismatch that results in an unhandled runtime error,

nvd

CVE-2025-46560P3HIGHCVSS 7.5v>= 0.8.0, < 0.8.52025-04-30

CVE-2025-46560 [HIGH] CWE-1333 CVE-2025-46560: vLLM is a high-throughput and memory-efficient inference and serving engine for LLMs. Versions start vLLM is a high-throughput and memory-efficient inference and serving engine for LLMs. Versions starting from 0.8.0 and prior to 0.8.5 are affected by a critical performance vulnerability in the input preprocessing logic of the multimodal tokenizer. The code dynamically replaces placeholder tokens (e.g., , ) with repeated tokens based on precomputed l

nvd

CVE-2026-53923P3HIGHCVSS 7.5v>= 0.5.5, < 0.23.1rc02026-06-22

CVE-2026-53923 [HIGH] CWE-200 CVE-2026-53923: vLLM is an inference and serving engine for large language models (LLMs). From 0.5.5 until 0.23.1rc0 vLLM is an inference and serving engine for large language models (LLMs). From 0.5.5 until 0.23.1rc0, integer truncation of tensor dimensions in vLLM's GGUF dequantize kernels (csrc/quantization/gguf/gguf_kernel.cu) causes partial tensor processing. The output tensor is allocated at full size via torch::empty (uninitialized memory), but the dequantize

nvd

CVE-2025-46722P3HIGHCVSS 7.3v>= 0.7.0, < 0.9.02025-05-29

CVE-2025-46722 [HIGH] CWE-1023 CVE-2025-46722: vLLM is an inference and serving engine for large language models (LLMs). In versions starting from vLLM is an inference and serving engine for large language models (LLMs). In versions starting from 0.7.0 to before 0.9.0, in the file vllm/multimodal/hasher.py, the MultiModalHasher class has a security and data integrity issue in its image hashing method. Currently, it serializes PIL.Image.Image objects using only obj.tobytes(), which returns only t

nvd

1 / 2Next →