Vllm-Project Vllm vulnerabilities

30 known vulnerabilities affecting vllm-project/vllm.

Total CVEs

30

CISA KEV

0

Public exploits

0

Exploited in wild

0

Severity breakdown

CRITICAL6HIGH13MEDIUM9LOW2

Vulnerabilities

Page 2 of 2

CVE-2025-47277CRITICALCVSS 9.8v>= 0.6.5, < 0.8.52025-05-20

CVE-2025-47277 [CRITICAL] CWE-502 CVE-2025-47277: vLLM, an inference and serving engine for large language models (LLMs), has an issue in versions 0.6 vLLM, an inference and serving engine for large language models (LLMs), has an issue in versions 0.6.5 through 0.8.4 that ONLY impacts environments using the `PyNcclPipe` KV cache transfer integration with the V0 engine. No other configurations are affected. vLLM supports the use of the `PyNcclPipe` class to establish a peer-to-peer communication

CVE-2025-30165HIGHCVSS 8.0v>= 0.5.2, <= 0.8.5.post12025-05-06

CVE-2025-30165 [HIGH] CWE-502 CVE-2025-30165: vLLM is an inference and serving engine for large language models. In a multi-node vLLM deployment u vLLM is an inference and serving engine for large language models. In a multi-node vLLM deployment using the V0 engine, vLLM uses ZeroMQ for some multi-node communication purposes. The secondary vLLM hosts open a `SUB` ZeroMQ socket and connect to an `XPUB` socket on the primary vLLM host. When data is received on this `SUB` socket, it is deserialized

CVE-2025-32444CRITICALCVSS 9.8v>= 0.6.5, < 0.8.52025-04-30

CVE-2025-32444 [CRITICAL] CWE-502 CVE-2025-32444: vLLM is a high-throughput and memory-efficient inference and serving engine for LLMs. Versions start vLLM is a high-throughput and memory-efficient inference and serving engine for LLMs. Versions starting from 0.6.5 and prior to 0.8.5, having vLLM integration with mooncake, are vulnerable to remote code execution due to using pickle based serialization over unsecured ZeroMQ sockets. The vulnerable sockets were set to listen on all network interfa

CVE-2025-30202HIGHCVSS 7.5v>= 0.5.2, < 0.8.52025-04-30

CVE-2025-30202 [HIGH] CWE-770 CVE-2025-30202: vLLM is a high-throughput and memory-efficient inference and serving engine for LLMs. Versions start vLLM is a high-throughput and memory-efficient inference and serving engine for LLMs. Versions starting from 0.5.2 and prior to 0.8.5 are vulnerable to denial of service and data exposure via ZeroMQ on multi-node vLLM deployment. In a multi-node vLLM deployment, vLLM uses ZeroMQ for some multi-node communication purposes. The primary vLLM host opens a

CVE-2025-46560HIGHCVSS 7.5v>= 0.8.0, < 0.8.52025-04-30

CVE-2025-46560 [HIGH] CWE-1333 CVE-2025-46560: vLLM is a high-throughput and memory-efficient inference and serving engine for LLMs. Versions start vLLM is a high-throughput and memory-efficient inference and serving engine for LLMs. Versions starting from 0.8.0 and prior to 0.8.5 are affected by a critical performance vulnerability in the input preprocessing logic of the multimodal tokenizer. The code dynamically replaces placeholder tokens (e.g., , ) with repeated tokens based on precomputed l

CVE-2024-9053CRITICALCVSS 9.8v0.6.02025-03-20

CVE-2024-9053 [CRITICAL] CWE-502 CVE-2024-9053: vllm-project vllm version 0.6.0 contains a vulnerability in the AsyncEngineRPCServer() RPC server en vllm-project vllm version 0.6.0 contains a vulnerability in the AsyncEngineRPCServer() RPC server entrypoints. The core functionality run_server_loop() calls the function _make_handler_coro(), which directly uses cloudpickle.loads() on received messages without any sanitization. This can result in remote code execution by deserializing malicious pic

CVE-2025-29783CRITICALCVSS 9.0v>= 0.6.5, < 0.8.02025-03-19

CVE-2025-29783 [CRITICAL] CWE-502 CVE-2025-29783: vLLM is a high-throughput and memory-efficient inference and serving engine for LLMs. When vLLM is c vLLM is a high-throughput and memory-efficient inference and serving engine for LLMs. When vLLM is configured to use Mooncake, unsafe deserialization exposed directly over ZMQ/TCP on all network interfaces will allow attackers to execute remote code on distributed hosts. This is a remote code execution vulnerability impacting any deployments using

CVE-2025-29770MEDIUMCVSS 6.5fixed in 0.8.02025-03-19

CVE-2025-29770 [MEDIUM] CWE-770 CVE-2025-29770: vLLM is a high-throughput and memory-efficient inference and serving engine for LLMs. The outlines l vLLM is a high-throughput and memory-efficient inference and serving engine for LLMs. The outlines library is one of the backends used by vLLM to support structured output (a.k.a. guided decoding). Outlines provides an optional cache for its compiled grammars on the local filesystem. This cache has been on by default in vLLM. Outlines is also availa

CVE-2025-25183LOWCVSS 2.6fixed in 0.7.22025-02-07

CVE-2025-25183 [LOW] CWE-354 CVE-2025-25183: vLLM is a high-throughput and memory-efficient inference and serving engine for LLMs. Maliciously co vLLM is a high-throughput and memory-efficient inference and serving engine for LLMs. Maliciously constructed statements can lead to hash collisions, resulting in cache reuse, which can interfere with subsequent responses and cause unintended behavior. Prefix caching makes use of Python's built-in hash() function. As of Python 3.12, the behavior of has

CVE-2025-24357HIGHCVSS 8.8fixed in 0.7.02025-01-27

CVE-2025-24357 [HIGH] CWE-502 CVE-2025-24357: vLLM is a library for LLM inference and serving. vllm/model_executor/weight_utils.py implements hf_m vLLM is a library for LLM inference and serving. vllm/model_executor/weight_utils.py implements hf_model_weights_iterator to load the model checkpoint, which is downloaded from huggingface. It uses the torch.load function and the weights_only parameter defaults to False. When torch.load loads malicious pickle data, it will execute arbitrary code durin

← Previous2 / 2