πŸ”§ HTML Tokenizer Vulnerability Fixed in Go's `x/net/html`


Recently, the Go team fixed a security vulnerability in the golang.org/x/net/html package.

Version v0.38.0 of golang.org/x/net was tagged to address this issue.

This version fixes a vulnerability where the HTML tokenizer could emit incorrect tokens and cause the parser to produce an incorrect DOM.

Specifically, the tokenizer incorrectly interpreted tags with unquoted attribute values that end with a solidus character (/) as self-closing.

As a result, content following such tags could be placed in the wrong scope during DOM construction.

  • This issue could affect any tag when using the Tokenizer directly.
  • It also affected tags inside foreign content contexts (like <svg> and <math>) when using html.Parse, ParseFragment, ParseFragmentWithOption, or ParseWithOptions.

This is CVE-2025-22872 and is tracked as Go issue #73070.

Thanks to Sean Ng (https://ensy.zip) for reporting this issue.

HTML Tokenizer Vulnerability Fix
HTML Tokenizer Vulnerability Fix

πŸ”’ A Simple Example

To understand the issue better, consider the following basic HTML:

<p a=/>hello</p>

❌ Before the Fix:

This would be misinterpreted as:

<p a="/"></p>
hello

Which means the <p> tag is considered self-closing, and “hello” is parsed as content outside the <p> element.

βœ… After the Fix:

It is now correctly parsed as:

<p a="/">hello</p>

“hello” is correctly placed inside the <p> tag.


πŸ“Š Reproducing the Bug with html.Tokenizer

This issue can be observed directly using the low-level html.Tokenizer.

Here’s an example that demonstrates how it misclassifies <p a=/> as self-closing before the fix.

package main

import (
	"fmt"
	"golang.org/x/net/html"
	"strings"
)

func main() {
	const input = `<p a=/>hello</p>`

	tokenizer := html.NewTokenizer(strings.NewReader(input))

	for {
		tt := tokenizer.Next()
		if tt == html.ErrorToken {
			break
		}

		token := tokenizer.Token()

		switch tt {
		case html.StartTagToken:
			fmt.Printf("Start tag: <%s>\n", token.Data)
			for _, attr := range token.Attr {
				fmt.Printf("  Attr: %s = %q\n", attr.Key, attr.Val)
			}
		case html.SelfClosingTagToken:
			fmt.Printf("Self-closing tag: <%s />\n", token.Data)
			for _, attr := range token.Attr {
				fmt.Printf("  Attr: %s = %q\n", attr.Key, attr.Val)
			}
		case html.EndTagToken:
			fmt.Printf("End tag: </%s>\n", token.Data)
		case html.TextToken:
			fmt.Printf("Text: %q\n", token.Data)
		}
	}
}

❌ Output Before the Fix:

Self-closing tag: <p />
  Attr: a = "/"
Text: "hello"
End tag: </p>

βœ… Output After the Fix:

Start tag: <p>
  Attr: a = "/"
Text: "hello"
End tag: </p>

πŸ” Clarifying Tokenizer Behavior vs. DOM Parsing

When you use html.Tokenizer, you’re manually consuming a flat stream of tokens.

There’s no DOM tree being built.

If the tokenizer emits a SelfClosingTagToken, it simply means:

“This tag appears to be self-closing based on its syntax.”

But since you’re not constructing a DOM, the tokenizer will still emit the next text token regardless of that classification.

It does not understand or enforce what content belongs inside which tag.

So even if a tag like <p a=/> is misinterpreted as self-closing, you will still see:

Self-closing tag: <p />
  Attr: a = "/"
Text: "hello"
End tag: </p>

βœ… The real issue becomes impactful when using html.Parse() β€” because the parser uses the token types to build a DOM structure.

If a tag is misclassified as self-closing, the parser will close it immediately, and any following content (like hello) will be incorrectly placed outside the tag.

That’s what causes bugs like misplaced children or unexpected layouts in parsed HTML documents.

Let’s look at how that happens in the DOM parser next.


πŸ“Š Reproducing the Bug with html.Parse in Foreign Context

The issue is also visible during full DOM parsing, especially when parsing foreign content (like SVG). Here’s an example:

package main

import (
	"fmt"
	"golang.org/x/net/html"
	"strings"
)

func main() {
	const input = `<svg><foo a=/>text</foo></svg>`

	doc, err := html.Parse(strings.NewReader(input))
	if err != nil {
		panic(err)
	}

	printDOM(doc, 0)
}

func printDOM(n *html.Node, depth int) {
	indent := strings.Repeat("  ", depth)
	switch n.Type {
	case html.ElementNode:
		fmt.Printf("%s<%s", indent, n.Data)
		for _, attr := range n.Attr {
			fmt.Printf(" %s=\"%s\"", attr.Key, attr.Val)
		}
		fmt.Println(">")
	case html.TextNode:
		fmt.Printf("%s%s\n", indent, n.Data)
	}

	for c := n.FirstChild; c != nil; c = c.NextSibling {
		printDOM(c, depth+1)
	}

	if n.Type == html.ElementNode {
		fmt.Printf("%s</%s>\n", indent, n.Data)
	}
}

❌ Output Before the Fix (v0.37.0):

  <html>
    <head>
    </head>
    <body>
      <svg>
        <foo a="/">
        </foo>
        text
      </svg>
    </body>
  </html>

At first glance, the intention of this input is clear:

  • The <foo> element should have an attribute a="/".
  • The "text" should appear inside the <foo> element, followed by a closing </foo>.

In this incorrect output, the parser mistakenly interprets <foo a=/> as a self-closing tag, equivalent to <foo a="/"/>.

So when the parser encounters the text, it thinks:

β€œOh, <foo> already closed β€” this must be text outside of it!”

As a result:

  • <foo> is closed immediately.
  • "text" becomes a sibling to <foo> inside <svg>, rather than a child of <foo>.

This leads to the following structure:

<svg>
  <foo a="/"></foo>   -- self-closed too early
  text                -- wrongly placed outside
</svg>

βœ… To apply the fix and ensure your Go project uses the corrected HTML tokenizer behavior, you need to update your go.mod file to use the latest version of golang.org/x/net that includes the fix β€” which is v0.38.0 or newer.


πŸ”§ Steps to Fix It

  1. Update your dependency using the terminal:
go get golang.org/x/net@v0.38.0

This will update the go.mod and go.sum files to reflect the correct version.

  1. Tidy up your module:
go mod tidy

This ensures unused dependencies are removed and all required ones are added cleanly.


πŸ“¦ Your go.mod should now contain:

require golang.org/x/net v0.38.0

This simple update ensures your project uses the fixed tokenizer logic, preventing the misclassification of tags like <foo a=/> and ensuring your HTML is parsed correctly β€” especially in foreign content contexts like <svg> and <math>.

βœ… Output After the Fix (v0.38.0):

  <html>
    <head>
    </head>
    <body>
      <svg>
        <foo a="/">
          text
        </foo>
      </svg>
    </body>
  </html>

The fix ensures correct nesting and DOM structure inside foreign contexts.


πŸ”§ Technical Fix Summary

The fix was made in the readStartTag function. Previously, the code checked if a / occurred before the tag’s closing >, and assumed that meant the tag was self-closing.

The new logic checks whether that / is actually part of the last attribute’s unquoted value.

This ensures that tags like <p a=/> or <foo a=/> are no longer incorrectly treated as self-closing.


πŸ“† Who Is Affected?

  • Anyone using golang.org/x/net/html.Tokenizer directly.
  • Anyone parsing foreign content with html.Parse() or similar functions (ParseFragment, etc.).

πŸ“„ Final Thoughts

Bugs in HTML parsers are easy to overlook but can have a large impact on web rendering, sanitization, and security.

This fix brings Go’s HTML parser more in line with expected behavior and ensures developers can rely on accurate DOM generation even in edge cases.

Big thanks again to Sean Ng for reporting this issue and to the Go team for promptly resolving it.

Stay safe and keep your Go modules updated!

Avatar

I am Arunkumar Gudelli, One among a million Software engineers of India.

https://golangtutorial.dev/authors/arungudelli/