cbcvebase.
CVE-2024-34359
published 2024-05-14

CVE-2024-34359: llama-cpp-python is the Python bindings for llama.cpp. `llama-cpp-python` depends on class `Llama` in `llama.py` to load `.gguf` llama.cpp or Latency Machine…

PriorityP266critical9.6CVSS 3.1
AVNACLPRNUIRSCCHIHAH
EPSS
28.42%
97.9th percentile
llama-cpp-python is the Python bindings for llama.cpp. `llama-cpp-python` depends on class `Llama` in `llama.py` to load `.gguf` llama.cpp or Latency Machine Learning Models. The `__init__` constructor built in the `Llama` takes several parameters to configure the loading and running of the model. Other than `NUMA, LoRa settings`, `loading tokenizers,` and `hardware settings`, `__init__` also loads the `chat template` from targeted `.gguf` 's Metadata and furtherly parses it to `llama_chat_format.Jinja2ChatFormatter.to_chat_handler()` to construct the `self.chat_handler` for this model. Nevertheless, `Jinja2ChatFormatter` parse the `chat template` within the Metadate with sandbox-less `jinja2.Environment`, which is furthermore rendered in `__call__` to construct the `prompt` of interaction. This allows `jinja2` Server Side Template Injection which leads to remote code execution by a carefully constructed payload.

Affected

2 ranges
VendorProductVersion rangeFixed in
lollmslollms_web_ui< 9.89.8
parisneoparisneo_lollms-webuiunspecified – latest

Detection & IOCsextracted from sources · hover to see the quote

versionllama_cpp_python-0.2.61+cpuavx2-cp311-cp311-manylinux_2_31_x86_64
hashb454f40a
pathentrypoints/openai/serving_rerank.py
url/v1/rerank
  • Detect exploitation of CVE-2024-34359 via lollms-webui's 'bindings_zoo' feature by monitoring for uploads or interactions with GGUF format model files sourced from Hugging Face, particularly through the 'binding_zoo' feature endpoint.
  • Flag use of jinja2.Environment() (unsandboxed) for rendering chat templates in LLM inference servers; the attack class relies on Jinja2 SSTI via malicious tokenizer.chat_template fields in GGUF model files.
  • Monitor GGUF model files for anomalous or executable Jinja2 content in the tokenizer.chat_template metadata field, which is stored alongside model weights and executes at every inference call.
  • Alert on requests to the /v1/rerank endpoint in SGLang when combined with recently loaded GGUF models from untrusted or third-party Hugging Face repositories, as this is the trigger path for RCE.
  • Scan GGUF files distributed via Hugging Face for tampered chat templates; poisoned templates evade automated security scans on the platform and are bundled with model weights in a single artifact.
  • ·CVE-2024-34359 affects llama_cpp_python; lollms-webui remained unpatched as of commit b454f40a, meaning deployments pinned to that commit or earlier are still vulnerable regardless of upstream fixes.
  • ·The same Jinja2 SSTI attack class (CVE-2024-34359 / Llama Drama) has been confirmed to affect multiple LLM serving frameworks; SGLang (CVE-2026-5760) received no patch during CERT/CC coordination, so no fix may be available.
  • ·GGUF templates execute on every inference call before user input is processed and are not subject to input-level guardrails, making template-embedded payloads invisible to standard runtime input filtering.
  • ·Over 6,000 models on Hugging Face were reportedly impaired by CVE-2024-34359 through the llama_cpp_python supply chain, indicating broad ecosystem exposure.
Stop checking back — get the weekly exploitation signal.

Every Monday: what got weaponized or added to CISA KEV in the last seven days — each CVE cross-linked to its PoC, Nuclei template, and detection rule. Free, one email a week, unsubscribe in one click.