CVE-2023-46229: LangChain before 0.0.317 allows SSRF via document_loaders/recursive_url_loader.py because crawling can proceed from an external server to an internal server.

PriorityP180high8.8CVSS 3.1

AVNACLPRNUIRSUCHIHAH

ITWVulnCheck KEV

Exploited in the wild

EPSS

44.71%

98.6th percentile

LangChain before 0.0.317 allows SSRF via document_loaders/recursive_url_loader.py because crawling can proceed from an external server to an internal server.

Affected

3 ranges

Vendor	Product	Version range	Fixed in
langchain	langchain	< 0.0.317	0.0.317
langchain	langchain	>= 0 < 0.0.317	0.0.317
langchain	langchain	>= 0 < 9ecb7240a480720ec9d739b3877a52f76098a2b8	9ecb7240a480720ec9d739b3877a52f76098a2b8

Detection & IOCsextracted from sources · hover to see the quote

pathlangchain/document_loaders/recursive_url_loader.py↗

pathlangchain/libs/langchain/langchain/document_loaders/sitemap.py↗

pathlangchain/document_loaders/web_base.py↗

→Detect SSRF attempts via LangChain SitemapLoader: monitor for outbound HTTP requests initiated by aiohttp.ClientSession.get that traverse from external/public URLs to internal/RFC-1918 IP space, as the scrape_all method invokes _fetch without any filtering or sanitizing. ↗
→Flag LangChain versions earlier than 0.0.317 in software inventory; the vulnerability is present in all prior versions and was patched in pull request langchain#11925 released in version 0.0.317. ↗
→Alert on HTTP requests to intranet/internal resources (e.g., instance metadata endpoints, internal APIs) originating from a LangChain process, which may indicate exploitation of the SitemapLoader SSRF to access local services, conduct port scans, or retrieve instance metadata. ↗
→Inspect sitemap XML documents supplied to LangChain SitemapLoader for URLs pointing to internal/private IP ranges or localhost; a malicious actor can embed intranet resource URLs in a crafted sitemap to trigger SSRF. ↗

·The patch for CVE-2023-46229 introduces a function called _extract_scheme_and_domain and an allowlist; defenders should verify the allowlist is properly configured to restrict crawling scope, as a misconfigured or overly permissive allowlist may still expose internal resources. ↗
·The SSRF vulnerability is triggered through the SitemapLoader's load method, which parses a user-supplied web_path as a sitemap XML and then fetches all URLs within it without restriction; any deployment accepting untrusted sitemap URLs is at risk on versions before 0.0.317. ↗

CVSS provenance

nvdv3.18.8HIGHCVSS:3.1/AV:N/AC:L/PR:N/UI:R/S:U/C:H/I:H/A:H

vulncheck8.8HIGH

vendor_redhat8.8HIGH

CVEs like this are exactly what “Exploited This Week” covers.

Every Monday: what got weaponized or added to CISA KEV in the last seven days — each CVE cross-linked to its PoC, Nuclei template, and detection rule. Free, one email a week, unsubscribe in one click.