CVE-2025-23311
published 2025-08-06CVE-2025-23311: NVIDIA Triton Inference Server contains a vulnerability where an attacker could cause a stack overflow through specially crafted HTTP requests. A successful…
PriorityP262critical9.8CVSS 3.1
AVNACLPRNUINSUCHIHAH
EPSS
2.46%
82.5th percentile
NVIDIA Triton Inference Server contains a vulnerability where an attacker could cause a stack overflow through specially crafted HTTP requests. A successful exploit of this vulnerability might lead to remote code execution, denial of service, information disclosure, or data tampering.
Affected
2 ranges
| Vendor | Product | Version range | Fixed in |
|---|---|---|---|
| nvidia | triton_inference_server | < 25.07 | 25.07 |
| nvidia | triton_inference_server | — | — |
Detection & IOCsextracted from sources · hover to see the quote
- →Detect HTTP requests using chunked transfer encoding (Transfer-Encoding: chunked) sent to Triton Inference Server API endpoints, particularly with a large number of small chunks (thousands of tiny chunks), which is the attack primitive used to amplify the alloca stack allocation. ↗
- →Alert on HTTP requests to Triton endpoints (inference, /v2/repository/index, model load/unload, trace settings, logging config, shared memory registration) that use chunked transfer encoding with a total request body approaching or exceeding 3MB, as this is the minimum threshold to trigger the segmentation fault. ↗
- →Monitor for Triton Inference Server process crashes (segmentation faults / SIGSEGV) following inbound HTTP requests, which may indicate exploitation attempts against the alloca stack overflow vulnerability. ↗
- →Flag chunked HTTP requests where each individual chunk is approximately 6 bytes in size sent to Triton API routes; this chunk size is the specific amplification ratio (6 bytes per chunk → 16 bytes of stack allocation) used in the PoC. ↗
- →Triton deployments running version 25.06 or earlier are vulnerable; prioritize detection and patching for these versions as the fix was released in version 25.07. ↗
- →The vulnerable attack surface includes unauthenticated endpoints by default; monitor all of the following routes for anomalous chunked requests: inference endpoint, /v2/repository/index, model load, model unload, trace setting updates, logging configuration updates, system shared memory registration, CUDA shared memory registration. ↗
- ·Authentication is disabled by default on most Triton API routes, meaning the vulnerability is exploitable without credentials in default deployments. ↗
- ·The alloca-based stack overflow is triggered via HTTP chunked transfer encoding; reverse proxies that reassemble chunked requests before forwarding to Triton may mitigate the attack by reducing the number of evbuffer segments. ↗
Stop checking back — get the weekly exploitation signal.
Every Monday: what got weaponized or added to CISA KEV in the last seven days — each CVE cross-linked to its PoC, Nuclei template, and detection rule. Free, one email a week, unsubscribe in one click.
No detection rules found.
No public exploits indexed.
Trailofbits
Uncovering memory corruption in NVIDIA Triton (as a new hire)
blogs_trailofbits·2025-08-05·CVSS 9.8
[CRITICAL] Uncovering memory corruption in NVIDIA Triton (as a new hire)
In my first month at Trail of Bits as an AI/ML security engineer, I found two remotely accessible memory corruption bugs in NVIDIA’s Triton Inference Server during a routine onboarding practice. The bugs result from the way HTTP requests are handled by a number of the API routes, including the inference endpoint.
Like all new hires, my first 30 days involved shadowing the team, getting familiar with our processes, and practicing using static analysis tools by running them against an open-source project of my choosing. I chose to focus on AI software that was in scope for Pwn2Own 2025. While the automated tools flagged potential issues, it took manual analysis to demonstrate exploitability, and required an alternate angle (in this case, chunked transfer encoding) to prove why a bug/unsafe
Trailofbits
Uncovering memory corruption in NVIDIA Triton (as a new hire)
blogs_trailofbits·2025-08-04·CVSS 9.8
[CRITICAL] Uncovering memory corruption in NVIDIA Triton (as a new hire)
In my first month at Trail of Bits as an AI/ML security engineer, I found two remotely accessible memory corruption bugs in NVIDIA’s Triton Inference Server during a routine onboarding practice. The bugs result from the way HTTP requests are handled by a number of the API routes, including the inference endpoint.
Like all new hires, my first 30 days involved shadowing the team, getting familiar with our processes, and practicing using static analysis tools by running them against an open-source project of my choosing. I chose to focus on AI software that was in scope for Pwn2Own 2025. While the automated tools flagged potential issues, it took manual analysis to demonstrate exploitability, and required an alternate angle (in this case, chunked transfer encoding) to prove why a bug/unsafe
2025-08-06
Published