-
-
Notifications
You must be signed in to change notification settings - Fork 851
Home
DOMPurify is a fast, standards-aware XSS sanitizer for HTML, MathML and SVG. You give it a string of untrusted markup, and it gives you back a string that is safe to insert into a page: anything that could run script or otherwise attack the DOM has been removed. It is written by security people who spend their time finding and fixing browser parsing bugs, and it is used in production by a very large number of applications.
The core idea is simple and deliberate: DOMPurify is allow-list based and DOM-based. It does not try to spot and strip "bad" patterns with regular expressions. Instead it parses your input into a real, inert DOM using the browser's own parser, walks that tree, and keeps only the elements and attributes that are explicitly known to be safe. Everything else is dropped. Working on a parsed DOM rather than on raw text is what lets DOMPurify see the markup the way the browser eventually will, which is essential for defending against mutation-based XSS (mXSS).
At a high level, every call to DOMPurify.sanitize() moves through the same pipeline.
-
Dirty input. You pass in an HTML string, or (in
IN_PLACEmode) a live DOM node you already have. - Parse into an isolated DOM. The string is handed to the browser's own HTML parser, which builds an inert document. Nothing in that document executes: scripts do not run, images and other resources do not load. This is the same parser the browser would use for real, so the tree DOMPurify inspects matches what the page would actually produce.
- Walk the tree and sanitize every node. DOMPurify iterates over every node and applies the allow-list, element by element and attribute by attribute. This is the heart of the library and is detailed below.
- Serialize the cleaned tree. The sanitized DOM is turned back into markup.
-
Clean output. By default you get a safe HTML string (or a
TrustedHTMLobject when Trusted Types are enabled). You can also ask for a DOM node or a document fragment instead.
Two things feed into the walk rather than sitting in the linear flow. Config (your allow-lists, profiles such as USE_PROFILES, and flags) decides what counts as safe. Hooks let you observe or modify each element and attribute as it is checked, without forking the library.
In IN_PLACE mode the steps are the same, except DOMPurify sanitizes the live node you handed it and returns that same node, cleaned, rather than producing a new string.
The walk is where the allow-list model does its work. For every node in the tree, DOMPurify asks a small, fixed set of questions.
-
Is the element allowed? The tag has to be on the allow-list and sit in a valid namespace (HTML, SVG or MathML, with no confusion between them). If it is not allowed, the element is removed. By default its text content is preserved (
KEEP_CONTENT), so dropping a<noscript>wrapper does not throw away the readable text inside it. -
Are its attributes allowed and safe? For a kept element, every attribute is checked: the name must be on the allow-list, and the value must pass the relevant checks, including URI validation for things like
hrefandsrc, and a guard against DOM clobbering. Disallowed or unsafe attributes are stripped; the rest are kept. -
Recurse into nested content. DOMPurify then descends into places the parser hides content from a naive walk, in particular shadow roots and
<template>content, and sanitizes those too.
Hooks (uponSanitizeElement, uponSanitizeAttribute, and the others) run at these decision points, so integrators can add their own rules or inspect what DOMPurify is doing.
Two design choices do most of the heavy lifting:
- Allow-list, not block-list. A block-list is only as good as the list of attacks you already know about; the first novel vector slips straight through. An allow-list fails safe: anything DOMPurify has not been told is safe is removed, so an unknown element or attribute is dropped rather than passed along.
- The real DOM, not text. Many historical sanitizer bypasses come from the gap between what a string looks like and what the browser actually parses it into. By building the real (inert) DOM and sanitizing that, DOMPurify closes most of that gap. The cases that remain, where the tree mutates on its way back into a live document, are mutation XSS, and defending against them is a continuous part of DOMPurify's work.
If you want the full picture of what DOMPurify does and does not promise, see the Security Goals and Threat Model. For the history of the attacks it defends against, including the parser-mutation, namespace, clobbering and template tricks, see Attack Classes and Bypass History.
-
Output formats. A clean string by default;
RETURN_TRUSTED_TYPEfor aTrustedHTMLobject;RETURN_DOMorRETURN_DOM_FRAGMENTto skip serialization and get DOM back. - Server-side. DOMPurify needs a DOM. In Node it runs on top of jsdom, so the same sanitization is available outside the browser.
-
Trusted Types. DOMPurify integrates with the Trusted Types API, so it can be used as a policy that produces
TrustedHTML. -
The native Sanitizer API. Browsers are starting to ship a built-in sanitizer (
Element.setHTML()). It is not yet available everywhere, so DOMPurify remains the cross-browser and server-side answer, and the right fallback where the native API is missing.
- README for installation and the full configuration reference
- Security Goals and Threat Model
- Attack Classes and Bypass History