cbcvebase.
CVE-2025-23311
published 2025-08-06

CVE-2025-23311: NVIDIA Triton Inference Server contains a vulnerability where an attacker could cause a stack overflow through specially crafted HTTP requests. A successful…

PriorityP262critical9.8CVSS 3.1
AVNACLPRNUINSUCHIHAH
EPSS
2.46%
82.5th percentile
NVIDIA Triton Inference Server contains a vulnerability where an attacker could cause a stack overflow through specially crafted HTTP requests. A successful exploit of this vulnerability might lead to remote code execution, denial of service, information disclosure, or data tampering.

Affected

2 ranges
VendorProductVersion rangeFixed in
nvidiatriton_inference_server< 25.0725.07
nvidiatriton_inference_server

Detection & IOCsextracted from sources · hover to see the quote

pathhttp_server.cc
pathsagemaker_server.cc
url/v2/repository/index
  • Detect HTTP requests using chunked transfer encoding (Transfer-Encoding: chunked) sent to Triton Inference Server API endpoints, particularly with a large number of small chunks (thousands of tiny chunks), which is the attack primitive used to amplify the alloca stack allocation.
  • Alert on HTTP requests to Triton endpoints (inference, /v2/repository/index, model load/unload, trace settings, logging config, shared memory registration) that use chunked transfer encoding with a total request body approaching or exceeding 3MB, as this is the minimum threshold to trigger the segmentation fault.
  • Monitor for Triton Inference Server process crashes (segmentation faults / SIGSEGV) following inbound HTTP requests, which may indicate exploitation attempts against the alloca stack overflow vulnerability.
  • Flag chunked HTTP requests where each individual chunk is approximately 6 bytes in size sent to Triton API routes; this chunk size is the specific amplification ratio (6 bytes per chunk → 16 bytes of stack allocation) used in the PoC.
  • Triton deployments running version 25.06 or earlier are vulnerable; prioritize detection and patching for these versions as the fix was released in version 25.07.
  • The vulnerable attack surface includes unauthenticated endpoints by default; monitor all of the following routes for anomalous chunked requests: inference endpoint, /v2/repository/index, model load, model unload, trace setting updates, logging configuration updates, system shared memory registration, CUDA shared memory registration.
  • ·Authentication is disabled by default on most Triton API routes, meaning the vulnerability is exploitable without credentials in default deployments.
  • ·The alloca-based stack overflow is triggered via HTTP chunked transfer encoding; reverse proxies that reassemble chunked requests before forwarding to Triton may mitigate the attack by reducing the number of evbuffer segments.
Stop checking back — get the weekly exploitation signal.

Every Monday: what got weaponized or added to CISA KEV in the last seven days — each CVE cross-linked to its PoC, Nuclei template, and detection rule. Free, one email a week, unsubscribe in one click.