CVE-2024-34359
published 2024-05-14CVE-2024-34359: llama-cpp-python is the Python bindings for llama.cpp. `llama-cpp-python` depends on class `Llama` in `llama.py` to load `.gguf` llama.cpp or Latency Machine…
PriorityP266critical9.6CVSS 3.1
AVNACLPRNUIRSCCHIHAH
EPSS
28.42%
97.9th percentile
llama-cpp-python is the Python bindings for llama.cpp. `llama-cpp-python` depends on class `Llama` in `llama.py` to load `.gguf` llama.cpp or Latency Machine Learning Models. The `__init__` constructor built in the `Llama` takes several parameters to configure the loading and running of the model. Other than `NUMA, LoRa settings`, `loading tokenizers,` and `hardware settings`, `__init__` also loads the `chat template` from targeted `.gguf` 's Metadata and furtherly parses it to `llama_chat_format.Jinja2ChatFormatter.to_chat_handler()` to construct the `self.chat_handler` for this model. Nevertheless, `Jinja2ChatFormatter` parse the `chat template` within the Metadate with sandbox-less `jinja2.Environment`, which is furthermore rendered in `__call__` to construct the `prompt` of interaction. This allows `jinja2` Server Side Template Injection which leads to remote code execution by a carefully constructed payload.
Affected
2 ranges
| Vendor | Product | Version range | Fixed in |
|---|---|---|---|
| lollms | lollms_web_ui | < 9.8 | 9.8 |
| parisneo | parisneo_lollms-webui | unspecified – latest | — |
Detection & IOCsextracted from sources · hover to see the quote
- →Detect exploitation of CVE-2024-34359 via lollms-webui's 'bindings_zoo' feature by monitoring for uploads or interactions with GGUF format model files sourced from Hugging Face, particularly through the 'binding_zoo' feature endpoint. ↗
- →Flag use of jinja2.Environment() (unsandboxed) for rendering chat templates in LLM inference servers; the attack class relies on Jinja2 SSTI via malicious tokenizer.chat_template fields in GGUF model files. ↗
- →Monitor GGUF model files for anomalous or executable Jinja2 content in the tokenizer.chat_template metadata field, which is stored alongside model weights and executes at every inference call. ↗
- →Alert on requests to the /v1/rerank endpoint in SGLang when combined with recently loaded GGUF models from untrusted or third-party Hugging Face repositories, as this is the trigger path for RCE. ↗
- →Scan GGUF files distributed via Hugging Face for tampered chat templates; poisoned templates evade automated security scans on the platform and are bundled with model weights in a single artifact. ↗
- ·CVE-2024-34359 affects llama_cpp_python; lollms-webui remained unpatched as of commit b454f40a, meaning deployments pinned to that commit or earlier are still vulnerable regardless of upstream fixes. ↗
- ·The same Jinja2 SSTI attack class (CVE-2024-34359 / Llama Drama) has been confirmed to affect multiple LLM serving frameworks; SGLang (CVE-2026-5760) received no patch during CERT/CC coordination, so no fix may be available. ↗
- ·GGUF templates execute on every inference call before user input is processed and are not subject to input-level guardrails, making template-embedded payloads invisible to standard runtime input filtering. ↗
- ·Over 6,000 models on Hugging Face were reportedly impaired by CVE-2024-34359 through the llama_cpp_python supply chain, indicating broad ecosystem exposure. ↗
Stop checking back — get the weekly exploitation signal.
Every Monday: what got weaponized or added to CISA KEV in the last seven days — each CVE cross-linked to its PoC, Nuclei template, and detection rule. Free, one email a week, unsubscribe in one click.
GHSA
GHSA-325v-4jhh-rp36: parisneo/lollms-webui, in its latest version, is vulnerable to remote code execution due to an insecure dependency on llama-cpp-python version llama_c
ghsa_unreviewed·2024-07-02·CVSS 9.6
CVE-2024-4897 [CRITICAL] CWE-76 GHSA-325v-4jhh-rp36: parisneo/lollms-webui, in its latest version, is vulnerable to remote code execution due to an insecure dependency on llama-cpp-python version llama_c
parisneo/lollms-webui, in its latest version, is vulnerable to remote code execution due to an insecure dependency on llama-cpp-python version llama_cpp_python-0.2.61+cpuavx2-cp311-cp311-manylinux_2_31_x86_64. The vulnerability arises from the application's 'binding_zoo' feature, which allows attackers to upload and interact with a malicious model file hosted on hugging-face, leading to remote code execution. The issue is linked to a known vulnerability in llama-cpp-python, CVE-2024-34359, which has not been patched in lollms-webui as of commit b454f40a. The vulnerability is exploitable through the application's handling of model files in the 'bindings_zoo' feature, specifically when processing gguf format model files.
GHSA
llama-cpp-python vulnerable to Remote Code Execution by Server-Side Template Injection in Model Metadata
ghsa·2024-05-13
CVE-2024-34359 [CRITICAL] CWE-76 llama-cpp-python vulnerable to Remote Code Execution by Server-Side Template Injection in Model Metadata
llama-cpp-python vulnerable to Remote Code Execution by Server-Side Template Injection in Model Metadata
## Description
`llama-cpp-python` depends on class `Llama` in `llama.py` to load `.gguf` llama.cpp or Latency Machine Learning Models. The `__init__` constructor built in the `Llama` takes several parameters to configure the loading and running of the model. Other than `NUMA, LoRa settings`, `loading tokenizers,` and `hardware settings`, `__init__` also loads the `chat template` from targeted `.gguf` 's Metadata and furtherly parses it to `llama_chat_format.Jinja2ChatFormatter.to_chat_handler()` to construct the `self.chat_handler` for this model. Nevertheless, `Jinja2ChatFormatter` parse the `chat template` within the Metadate with sandbox-less `jinja2.Environment`, which is furtherm
OSV
llama-cpp-python vulnerable to Remote Code Execution by Server-Side Template Injection in Model Metadata
osv·2024-05-13
CVE-2024-34359 [CRITICAL] llama-cpp-python vulnerable to Remote Code Execution by Server-Side Template Injection in Model Metadata
llama-cpp-python vulnerable to Remote Code Execution by Server-Side Template Injection in Model Metadata
## Description
`llama-cpp-python` depends on class `Llama` in `llama.py` to load `.gguf` llama.cpp or Latency Machine Learning Models. The `__init__` constructor built in the `Llama` takes several parameters to configure the loading and running of the model. Other than `NUMA, LoRa settings`, `loading tokenizers,` and `hardware settings`, `__init__` also loads the `chat template` from targeted `.gguf` 's Metadata and furtherly parses it to `llama_chat_format.Jinja2ChatFormatter.to_chat_handler()` to construct the `self.chat_handler` for this model. Nevertheless, `Jinja2ChatFormatter` parse the `chat template` within the Metadate with sandbox-less `jinja2.Environment`, which is furtherm
No detection rules found.
No public exploits indexed.
arXiv
Inference-Time Backdoors via Hidden Instructions in LLM Chat Templates
arxiv_fulltext·2026-03-09
Inference-Time Backdoors via Hidden Instructions in LLM Chat Templates
0pt18pt
[ ][c]
tabularcc
Ariel Fogel* &
Omer Hofman*
[email protected] &
[email protected]
&
Eilon Cohen &
Roman Vainshtein
[email protected] &
[email protected]
tabular
\@makefnmark
Security Research of Europe *Equal contribution.
## Abstract
Open-weight language models are increasingly used in production settings, raising new security challenges.
One prominent threat in this context is backdoor attacks, in which adversaries embed hidden behaviors in language models that activate under specific conditions. Previous work has assumed that adversaries have access to training pipelines or deployment infrastructure. We propose a novel attack surface requiring neither, which utilizes
the chat template. Chat templates are executable Jinja2 programs invoked at ev
arXiv
SASER: Stego attacks on open-source LLMs
arxiv_fulltext·2025-10-12
SASER: Stego attacks on open-source LLMs
SASER: Stego attacks on open-source LLMs
Ming Tan*,1, Wei Li*,2, Hu Tao ^ ,1, Hailong Ma1, Aodi Liu1, Qian Chen2, and Zilong Wang2
*These authors contributed equally to this work.
^ Corresponding author.
1Information Engineering University, Zhengzhou, China;
2Xidian University, Xi'an, China.
Email: [email protected]; [email protected]; [email protected]; [email protected]; [email protected];\ [email protected]; [email protected].
Tan et al.
## Abstract
Open-source large language models (LLMs) have demonstrated considerable dominance over proprietary LLMs in resolving neural processing tasks, thanks to the collaborative and sharing nature. Although full access to source codes, model parameters, and training data lays the groundwork for transparency, we argue that such a full-a
Hackernews
SGLang CVE-2026-5760 (CVSS 9.8) Enables RCE via Malicious GGUF Model Files
blogs_hackernews·2026-04-20·CVSS 9.8
CVE-2026-5760 [CRITICAL] SGLang CVE-2026-5760 (CVSS 9.8) Enables RCE via Malicious GGUF Model Files
Home
Threat Intelligence
Vulnerabilities
Cyber Attacks
Webinars
Expert Insights
Awards
Webinars
Awards
Free eBooks
About THN
Jobs
Advertise with us
## SGLang CVE-2026-5760 (CVSS 9.8) Enables RCE via Malicious GGUF Model Files
A critical security vulnerability has been disclosed in SGLang that, if successfully exploited, could result in remote code execution on susceptible systems.
The vulnerability, tracked as CVE-2026-5760 , carries a CVSS score of 9.8 out of 10.0. It has been described as a case of command injection leading to the execution of arbitrary code.
SGLang is a high-performance, open-source serving framework for large language models and multimodal models. The official GitHub project has been forked over 5,500 times and starred 26,100 times.
According to the
Trendmicro
The Road to Agentic AI: Exposed Foundations
blogs_trendmicro·2024-12-04
The Road to Agentic AI: Exposed Foundations
Artificial Intelligence (AI)
# The Road to Agentic AI: Exposed Foundations
Our research into Retrieval Augmented Generation (RAG) systems uncovered at least 80 unprotected servers. We highlight this problem, which can lead to potential data loss and unauthorized access.
By: Morton Swimmer, Philippe Lin, Vincenzo Ciancaglini, Marco Balduzzi, Stephen Hilt
2024/12/04
Read time: ( words)
Save to Folio
Report highlights:
- Retrieval augmented generation (RAG) enables enterprises to build customized, efficient, and cost-effective applications based on private data. However, research reveals significant security risks, such as exposed vector stores and LLM-hosting platforms, which can lead to data leaks, unauthorized access, and potential system manipulation if not properly secured.
- Secu
https://github.com/abetlen/llama-cpp-python/commit/b454f40a9a1787b2b5659cd2cb00819d983185dfhttps://github.com/abetlen/llama-cpp-python/security/advisories/GHSA-56xg-wfcc-g829https://github.com/abetlen/llama-cpp-python/commit/b454f40a9a1787b2b5659cd2cb00819d983185dfhttps://github.com/abetlen/llama-cpp-python/security/advisories/GHSA-56xg-wfcc-g829
2024-05-14
Published