CVE-2026-53923
published 2026-06-22CVE-2026-53923: vLLM is an inference and serving engine for large language models (LLMs). From 0.5.5 until 0.23.1rc0, integer truncation of tensor dimensions in vLLM's GGUF…
PriorityP342high7.5CVSS 3.1
AVNACLPRNUINSUCHINAN
EPSS
0.28%
19.8th percentile
vLLM is an inference and serving engine for large language models (LLMs). From 0.5.5 until 0.23.1rc0, integer truncation of tensor dimensions in vLLM's GGUF dequantize kernels (csrc/quantization/gguf/gguf_kernel.cu) causes partial tensor processing. The output tensor is allocated at full size via torch::empty (uninitialized memory), but the dequantize CUDA kernel processes only a truncated number of elements. The unfilled portion of the output tensor retains whatever was previously in GPU memory. In multi-tenant inference deployments, this residual GPU memory may contain tensor data from other users' inference requests, constituting information disclosure. This vulnerability is fixed in 0.23.1rc0.
Affected
3 ranges
| Vendor | Product | Version range | Fixed in |
|---|---|---|---|
| vllm-project | vllm | — | — |
| vllm | vllm | >= 0.5.5 < 0.23.1 | 0.23.1 |
| vllm | vllm | 0.5.5 – 0.23.0 | — |
CVSS provenance
nvdv3.17.5HIGHCVSS:3.1/AV:N/AC:L/PR:N/UI:N/S:U/C:H/I:N/A:N
nvdv4.05.3MEDIUMCVSS:4.0/AV:N/AC:L/AT:N/PR:N/UI:P/VC:L/VI:L/VA:N/SC:N/SI:N/SA:N/E:X/CR:X/IR:X/AR:X/MAV:X/MAC:X/MAT:X/MPR:X/MUI:X/MVC:X/MVI:X/MVA:X/MSC:X/MSI:X/MSA:X/S:X/AU:X/R:X/V:X/RE:X/U:X
vendor_redhat7.5HIGH
Stop checking back — get the weekly exploitation signal.
Every Monday: what got weaponized or added to CISA KEV in the last seven days — each CVE cross-linked to its PoC, Nuclei template, and detection rule. Free, one email a week, unsubscribe in one click.
Red Hat
vllm: vLLM: Information disclosure via integer truncation
vendor_redhat·2026-06-22·CVSS 7.5
CVE-2026-53923 [HIGH] CWE-824 vllm: vLLM: Information disclosure via integer truncation
vllm: vLLM: Information disclosure via integer truncation
vLLM is an inference and serving engine for large language models (LLMs). From 0.5.5 until 0.23.1rc0, integer truncation of tensor dimensions in vLLM's GGUF dequantize kernels (csrc/quantization/gguf/gguf_kernel.cu) causes partial tensor processing. The output tensor is allocated at full size via torch::empty (uninitialized memory), but the dequantize CUDA kernel processes only a truncated number of elements. The unfilled portion of the output tensor retains whatever was previously in GPU memory. In multi-tenant inference deployments, this residual GPU memory may contain tensor data from other users' inference requests, constituting information disclosure. This vulnerability is fixed in 0.23.1rc0.
A flaw was found in vLLM. Integer
GHSA
vLLM: GGUF dequantize kernel int truncation exposes uninitialized GPU memory in multi-tenant serving
ghsa·2026-06-17
CVE-2026-53923 [MEDIUM] CWE-200 vLLM: GGUF dequantize kernel int truncation exposes uninitialized GPU memory in multi-tenant serving
vLLM: GGUF dequantize kernel int truncation exposes uninitialized GPU memory in multi-tenant serving
## Summary
Integer truncation of tensor dimensions in vLLM's GGUF dequantize kernels (`csrc/quantization/gguf/gguf_kernel.cu`) causes partial tensor processing. The output tensor is allocated at full size via `torch::empty` (uninitialized memory), but the dequantize CUDA kernel processes only a truncated number of elements. The unfilled portion of the output tensor retains whatever was previously in GPU memory. In multi-tenant inference deployments, this residual GPU memory may contain tensor data from other users' inference requests, constituting information disclosure.
## Root Cause
The `to_cuda_ggml_t` function pointer type at `ggml-common.h:1067` declares its element count parameter
No detection rules found.
No public exploits indexed.
2026-06-22
Published