You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: source/_posts/Sanitize Client-Side: Why Server-Side HTML Sanitization is Doomed to Fail.md
+4-3Lines changed: 4 additions & 3 deletions
Original file line number
Diff line number
Diff line change
@@ -1,5 +1,5 @@
1
1
---
2
-
title: "Sanitize Client-Side:Why Server-Side HTML Sanitization is Doomed to Fail"
2
+
title: "Sanitize Client-Side:Why Server-Side HTML Sanitization is Doomed to Fail"
3
3
date: 2024-11-05
4
4
tags:
5
5
- "html"
@@ -19,7 +19,8 @@ Background
19
19
20
20
### The Problem
21
21
22
-
When a web application receives user-controlled input, such as comments or form submissions, it's essential to ensure that this input is safe before displaying it to users. This prevents malicious code, like JavaScript, from being injected into the page and executed, leading to potential XSS vulnerabilities. One straightforward approach is to escape every character that has special meaning in HTML such as lower/greater than signs (`<`, `>`), but more often than not web applications would actually like to support HTML input from the user to a certain extent, such as allowing titles, images, and bullet points.\
22
+
When a web application receives user-controlled input, such as comments or form submissions, it's essential to ensure that this input is safe before displaying it to users. This prevents malicious code, like JavaScript, from being injected into the page and executed, leading to potential XSS vulnerabilities. One straightforward approach is to escape every character that has special meaning in HTML such as lower/greater than signs (`<`, `>`), but more often than not web applications would actually like to support HTML input from the user to a certain extent, such as allowing titles, images, and bullet points.
23
+
23
24
To strike a balance between security and functionality, web applications often need to implement techniques that allow for certain HTML elements and attributes while still protecting against harmful content.
@@ -73,7 +74,7 @@ Can you spot the core issue? If you do, you are probably an HTML expert at this
73
74
Despite the variety of bypasses, they all share a common root cause. Let's try another guess, but this time with a hint: These bypasses affected not one sanitizer but **most ones written in PHP**.\
74
75
Can you guess the root cause now?
75
76
76
-
Taking a step back and considering the common steps of [how sanitizers work](#how-do-html-sanitizers-work), we noticed that the common denominator for vulnerable sanitizers is the parsing algorithm. In our case, most sanitizers we looked at written in PHP were using the built-in HTML parser. Given PHP's primary use in web development, it offers an out-of-the-box HTML parser. Due to its convenience, it's understandable why sanitizer developers opt-out to use it. However, if this parser's behavior differs from the victim's browser, it creates a discrepancy that attackers can exploit
77
+
Taking a step back and considering the common steps of [how sanitizers work](#How-do-HTML-Sanitizers-work), we noticed that the common denominator for vulnerable sanitizers is the parsing algorithm. In our case, most sanitizers we looked at written in PHP were using the built-in HTML parser. Given PHP's primary use in web development, it offers an out-of-the-box HTML parser. Due to its convenience, it's understandable why sanitizer developers opt-out to use it. However, if this parser's behavior differs from the victim's browser, it creates a discrepancy that attackers can exploit
77
78
78
79
So in what way was the PHP parser different from the browser that caused these bypasses? If we split the payloads by HTML features:
0 commit comments