Cross Site Scripting (XSS) tutorial
A guide to cross-site scripting attacks. How do they work? How can you prevent them?
What is XSS, aka Cross Site Scripting?
XSS is the term we use to define a particular kind of attack where a website (your website, if you don’t pay attention) might be used as a vector to attack its users, because of an insecure handing of the user input.
Using this vulnerability, they can steal user’s information.
Based on how the XSS vulnerability is exploited, we have 3 major kinds of XSS attacks:
- persistent XSS
- reflected XSS
- DOM-based XSS
Why is XSS dangerous?
This is very dangerous.
Due to your negligence in fixing a XSS vulnerability, your site can be used as an attack vector and your users information is at risk.
Is XSS a frontend or backend problem?
It’s both. It’s a website architectural problem that involves both the frontend and the backend.
An XSS attack example
XSS is basically enabled when you allow the user to enter information, which you store (in your backend) and then present back.
<script></script>. For example
I used a simple
alert() call to make an example, but as listed above a user could enter any kind of script. At this point, the site is compromised.
What is persistent XSS?
Persistent XSS is one of the 3 kinds of XSS we find in the wild. It’s the one I described above in the blog post example.
In this case, the code for the vulnerability is stored in the database or into some other source, which is hosted by yourself.
What is reflected XSS?
Reflected XSS is a way to exploit a vulnerability in your site on-the-fly by providing the end user a link that has a script inside it.
In this way, the attacker provides a link similar to
If your site uses the
example GET variable to perform something and display it on the page, and you don’t check and sanitize its value, now that script will be executed by the user’s browser.
A typical example is a search form. It might live in the
/search URL and you might accept the search term using the GET
You might display the
You searched for <term> string when someone searches for it. Now, if you didn’t sanitize the value, you now have a problem.
Spam/phishing emails are a common medium for this XSS attack. Of course, the bigger and more important the site, the more frequently hackers will try to hack it.
What is DOM-based XSS?
With persistent XSS, the attacking code must be sent to the server, where it can be (and hopefully it is) sanitized. With reflected XSS, the same is true.
DOM-based XSS is a kind of XSS where the malicious code is never sent to the server. It’s common for this to happen by using the fragment part of a URL, or by referencing
document.location.href. Some examples you find online don’t really work any more because modern browsers automatically escape JS in the address bar for us. They only work if you unescape it, which is kind of scary (don’t do it!).
test variable passed in the fragment part of the URL:
#test=something value is never send to the server. It’s only local. Persistent/reflected XSS would not work. But say your script accesses that value using:
const pos = document.URL.indexOf("test=") + 5; const value = document.URL.substring(document.URL.indexOf("test=") + 5, document.URL.length)
and you write it directly into the DOM:
All is fine, until someone calls the URL like this:
Now, thanks to the automatic escaping that happens by referencing
document.URL nothing should happen in this specific case.
printed to the page. The value is escaped, so it’s not interpreted as HTML.
On older browser, this was a much bigger problem, since they didn’t auto-escape JS put into the address bar.
Are static sites vulnerable to XSS?
Yes! Any kind of site, actually. Because being static does not mean there is no information loaded from other sources. For example you might roll your own form or comments, even without a database.
Or, we might have a search functionality that accepts input from an HTTP GET or POST variable. You are not immune just by not having a database.
How can we prevent XSS?
There are 3 techniques we can use:
<script> will be encoded to
Encoding, as a general rule, should be always done.
Server-side frameworks commonly provide helper functions to provide this functionality to you.
If you need to add content to an HTML element, the best way is to assign the user-generated input to that element using the
textContent property. The browser will do all the escaping for you:
document.querySelector('#myElement').textContent = theUserGeneratedInput
If you need to create an element use
const el = document.createTextNode(theUserGeneratedInput)
If you need to add content to an HTML attribute, use the
setAttribute() method of the element:
If you need to add content to the URL, use the
window.location.href = window.location.href + '?test=' + window.encodeURIComponent(theUserGeneratedInput)
Validation is usually done when you cannot use escaping to filter the input. A common example is a CMS that lets the user define the content of the page in HTML. You can’t escape that.
You either use a blacklisting or whitelisting strategy for validation. The difference is that with blacklisting you decide which tags you want to disallow. With whitelisting you decide which tags you want to allow. Whitelisting is safer because blacklisting is prone to errors, complex and also not future-proof.
CSP is enabled by the Web Server, by adding the
Content‑Security‑Policy HTTP Header when serving the page.
THE VALLEY OF CODE
THE WEB DEVELOPER's MANUAL
You might be interested in those things I do:
- Learn to code in THE VALLEY OF CODE, your your web development manual
- Find a ton of Web Development projects to learn modern tech stacks in practice in THE VALLEY OF CODE PRO
- I wrote 16 books for beginner software developers, DOWNLOAD THEM NOW
- Every year I organize a hands-on cohort course coding BOOTCAMP to teach you how to build a complex, modern Web Application in practice (next edition February-March-April-May 2024)
- Learn how to start a solopreneur business on the Internet with SOLO LAB (next edition in 2024)
- Find me on X