CVE-2025-23311: NVIDIA Triton Inference Server contains a vulnerability where an attacker could cause a stack overflow through specially crafted HTTP requests. A successful…

PriorityP262critical9.8CVSS 3.1

AVNACLPRNUINSUCHIHAH

EPSS

2.46%

82.5th percentile

NVIDIA Triton Inference Server contains a vulnerability where an attacker could cause a stack overflow through specially crafted HTTP requests. A successful exploit of this vulnerability might lead to remote code execution, denial of service, information disclosure, or data tampering.

Affected

2 ranges

Vendor	Product	Version range	Fixed in
nvidia	triton_inference_server	< 25.07	25.07
nvidia	triton_inference_server	—	—

Detection & IOCsextracted from sources · hover to see the quote

pathhttp_server.cc↗

pathsagemaker_server.cc↗

url/v2/repository/index↗

→Detect HTTP requests using chunked transfer encoding (Transfer-Encoding: chunked) sent to Triton Inference Server API endpoints, particularly with a large number of small chunks (thousands of tiny chunks), which is the attack primitive used to amplify the alloca stack allocation. ↗
→Alert on HTTP requests to Triton endpoints (inference, /v2/repository/index, model load/unload, trace settings, logging config, shared memory registration) that use chunked transfer encoding with a total request body approaching or exceeding 3MB, as this is the minimum threshold to trigger the segmentation fault. ↗
→Monitor for Triton Inference Server process crashes (segmentation faults / SIGSEGV) following inbound HTTP requests, which may indicate exploitation attempts against the alloca stack overflow vulnerability. ↗
→Flag chunked HTTP requests where each individual chunk is approximately 6 bytes in size sent to Triton API routes; this chunk size is the specific amplification ratio (6 bytes per chunk → 16 bytes of stack allocation) used in the PoC. ↗
→Triton deployments running version 25.06 or earlier are vulnerable; prioritize detection and patching for these versions as the fix was released in version 25.07. ↗
→The vulnerable attack surface includes unauthenticated endpoints by default; monitor all of the following routes for anomalous chunked requests: inference endpoint, /v2/repository/index, model load, model unload, trace setting updates, logging configuration updates, system shared memory registration, CUDA shared memory registration. ↗

·Authentication is disabled by default on most Triton API routes, meaning the vulnerability is exploitable without credentials in default deployments. ↗
·The alloca-based stack overflow is triggered via HTTP chunked transfer encoding; reverse proxies that reassemble chunked requests before forwarding to Triton may mitigate the attack by reducing the number of evbuffer segments. ↗

Stop checking back — get the weekly exploitation signal.

Every Monday: what got weaponized or added to CISA KEV in the last seven days — each CVE cross-linked to its PoC, Nuclei template, and detection rule. Free, one email a week, unsubscribe in one click.