π§ HTML Tokenizer Vulnerability Fixed in Go's `x/net/html`
Recently, the Go team fixed a security vulnerability in the golang.org/x/net/html
package.
Version v0.38.0 of golang.org/x/net
was tagged to address this issue.
This version fixes a vulnerability where the HTML tokenizer could emit incorrect tokens and cause the parser to produce an incorrect DOM.
Specifically, the tokenizer incorrectly interpreted tags with unquoted attribute values that end with a solidus character (/
) as self-closing.
As a result, content following such tags could be placed in the wrong scope during DOM construction.
- This issue could affect any tag when using the
Tokenizer
directly. - It also affected tags inside foreign content contexts (like
<svg>
and<math>
) when usinghtml.Parse
,ParseFragment
,ParseFragmentWithOption
, orParseWithOptions
.
This is CVE-2025-22872 and is tracked as Go issue #73070.
Thanks to Sean Ng (https://ensy.zip) for reporting this issue.

π’ A Simple Example
To understand the issue better, consider the following basic HTML:
<p a=/>hello</p>
β Before the Fix:
This would be misinterpreted as:
<p a="/"></p>
hello
Which means the <p>
tag is considered self-closing, and “hello” is parsed as content outside the <p>
element.
β After the Fix:
It is now correctly parsed as:
<p a="/">hello</p>
“hello” is correctly placed inside the <p>
tag.
π Reproducing the Bug with html.Tokenizer
This issue can be observed directly using the low-level html.Tokenizer
.
Hereβs an example that demonstrates how it misclassifies <p a=/>
as self-closing before the fix.
package main
import (
"fmt"
"golang.org/x/net/html"
"strings"
)
func main() {
const input = `<p a=/>hello</p>`
tokenizer := html.NewTokenizer(strings.NewReader(input))
for {
tt := tokenizer.Next()
if tt == html.ErrorToken {
break
}
token := tokenizer.Token()
switch tt {
case html.StartTagToken:
fmt.Printf("Start tag: <%s>\n", token.Data)
for _, attr := range token.Attr {
fmt.Printf(" Attr: %s = %q\n", attr.Key, attr.Val)
}
case html.SelfClosingTagToken:
fmt.Printf("Self-closing tag: <%s />\n", token.Data)
for _, attr := range token.Attr {
fmt.Printf(" Attr: %s = %q\n", attr.Key, attr.Val)
}
case html.EndTagToken:
fmt.Printf("End tag: </%s>\n", token.Data)
case html.TextToken:
fmt.Printf("Text: %q\n", token.Data)
}
}
}
β Output Before the Fix:
Self-closing tag: <p />
Attr: a = "/"
Text: "hello"
End tag: </p>
β Output After the Fix:
Start tag: <p>
Attr: a = "/"
Text: "hello"
End tag: </p>
π Clarifying Tokenizer Behavior vs. DOM Parsing
When you use html.Tokenizer
, you’re manually consuming a flat stream of tokens.
There’s no DOM tree being built.
If the tokenizer emits a SelfClosingTagToken
, it simply means:
“This tag appears to be self-closing based on its syntax.”
But since you’re not constructing a DOM, the tokenizer will still emit the next text token regardless of that classification.
It does not understand or enforce what content belongs inside which tag.
So even if a tag like <p a=/>
is misinterpreted as self-closing, you will still see:
Self-closing tag: <p />
Attr: a = "/"
Text: "hello"
End tag: </p>
β
The real issue becomes impactful when using html.Parse()
β because the parser uses the token types to build a DOM structure.
If a tag is misclassified as self-closing, the parser will close it immediately, and any following content (like hello
) will be incorrectly placed outside the tag.
Thatβs what causes bugs like misplaced children or unexpected layouts in parsed HTML documents.
Letβs look at how that happens in the DOM parser next.
π Reproducing the Bug with html.Parse
in Foreign Context
The issue is also visible during full DOM parsing, especially when parsing foreign content (like SVG). Here’s an example:
package main
import (
"fmt"
"golang.org/x/net/html"
"strings"
)
func main() {
const input = `<svg><foo a=/>text</foo></svg>`
doc, err := html.Parse(strings.NewReader(input))
if err != nil {
panic(err)
}
printDOM(doc, 0)
}
func printDOM(n *html.Node, depth int) {
indent := strings.Repeat(" ", depth)
switch n.Type {
case html.ElementNode:
fmt.Printf("%s<%s", indent, n.Data)
for _, attr := range n.Attr {
fmt.Printf(" %s=\"%s\"", attr.Key, attr.Val)
}
fmt.Println(">")
case html.TextNode:
fmt.Printf("%s%s\n", indent, n.Data)
}
for c := n.FirstChild; c != nil; c = c.NextSibling {
printDOM(c, depth+1)
}
if n.Type == html.ElementNode {
fmt.Printf("%s</%s>\n", indent, n.Data)
}
}
β Output Before the Fix (v0.37.0):
<html>
<head>
</head>
<body>
<svg>
<foo a="/">
</foo>
text
</svg>
</body>
</html>
At first glance, the intention of this input is clear:
- The
<foo>
element should have an attributea="/"
. - The
"text"
should appear inside the<foo>
element, followed by a closing</foo>
.
In this incorrect output, the parser mistakenly interprets <foo a=/>
as a self-closing tag, equivalent to <foo a="/"/>
.
So when the parser encounters the text
, it thinks:
βOh,
<foo>
already closed β this must be text outside of it!β
As a result:
<foo>
is closed immediately."text"
becomes a sibling to<foo>
inside<svg>
, rather than a child of<foo>
.
This leads to the following structure:
<svg>
<foo a="/"></foo> -- self-closed too early
text -- wrongly placed outside
</svg>
β
To apply the fix and ensure your Go project uses the corrected HTML tokenizer behavior, you need to update your go.mod
file to use the latest version of golang.org/x/net
that includes the fix β which is v0.38.0 or newer.
π§ Steps to Fix It
- Update your dependency using the terminal:
go get golang.org/x/net@v0.38.0
This will update the
go.mod
andgo.sum
files to reflect the correct version.
- Tidy up your module:
go mod tidy
This ensures unused dependencies are removed and all required ones are added cleanly.
π¦ Your go.mod
should now contain:
require golang.org/x/net v0.38.0
This simple update ensures your project uses the fixed tokenizer logic, preventing the misclassification of tags like <foo a=/>
and ensuring your HTML is parsed correctly β especially in foreign content contexts like <svg>
and <math>
.
β Output After the Fix (v0.38.0):
<html>
<head>
</head>
<body>
<svg>
<foo a="/">
text
</foo>
</svg>
</body>
</html>
The fix ensures correct nesting and DOM structure inside foreign contexts.
π§ Technical Fix Summary
The fix was made in the readStartTag
function. Previously, the code checked if a /
occurred before the tag’s closing >
, and assumed that meant the tag was self-closing.
The new logic checks whether that /
is actually part of the last attribute’s unquoted value.
This ensures that tags like <p a=/>
or <foo a=/>
are no longer incorrectly treated as self-closing.
π Who Is Affected?
- Anyone using
golang.org/x/net/html.Tokenizer
directly. - Anyone parsing foreign content with
html.Parse()
or similar functions (ParseFragment
, etc.).
π Final Thoughts
Bugs in HTML parsers are easy to overlook but can have a large impact on web rendering, sanitization, and security.
This fix brings Go’s HTML parser more in line with expected behavior and ensures developers can rely on accurate DOM generation even in edge cases.
Big thanks again to Sean Ng for reporting this issue and to the Go team for promptly resolving it.
Stay safe and keep your Go modules updated!