Web security and measures to protect your site

Post Reply
Andreu Botella
Site Admin
Posts: 11
Joined: Sun Nov 10, 2019 12:36 am

Web security and measures to protect your site

Post by Andreu Botella » Sat Nov 23, 2019 10:29 am

(También hay una versión en español de este tema.)

Web security is a topic that every web developer should know, and that shouldn't be overlooked. On that end, I'm working on a series on web security at the Friday meetings of the mod_harbour team, and I'll be posting the corresponding write-ups.

These are the topics we're covering (might be added to in the future):
  • XSS (cross-site scripting). Covered 22 November 2019.
  • SQL / DBF injection.
  • CORS (cross-origin resource sharing)
  • CORB/CORP (cross-origin read blocking/policy)
  • ...
Basic web security concepts

In the web platform, there are four entities that interact with each other: the user, the browser, the server and the application (that is, the components that generate the code the server is sending – in our case mod_harbour plus any database engine). Here's how each can be vulnerable:
  • If the website has a login of some sort or is linked to services that do, the user expects your site to not do things that it didn't agree to, such making their password public, posting fake messages in the site claiming to be from that user, or performing fake transactions on behalf of the user. While users should take measures to prevent visiting malicious sites, a site can become malicious to the user without meaning to, if it has vulnerabilities in its application layers.
  • The browser can have vulnerabilities, just like any other software, but they're typically resolved quickly after they're discovered – and often the bug is only made public at the same time as the patch fixing it is released. Website owners and developers don't need to worry about these vulnerabilities.
  • The server can be attacked by denial of service attacks, and in general attacks from lower layers of the network stack. Unless using a custom server or with strange custom settings, a firewall is usually enough to block these attacks. Servers can also have vulnerabilities, though, which is why you should keep both the server machine and the server program always updated.
  • Then there's the application. We've already seen how application vulnerabilities can lead to user-level security issues, but it goes further than that. Applications usually consist of one part that runs code (the harbour compiler and interpreter, in this case), plus some form of storage, which may provide the code to run (that is, databases as well as the server machine's filesystem). The part that runs code is subject to the same vulnerabilities as any other system, except that killing that part could leave every user without service (which isn't the case for mod_harbour). The storage, however, could be read, changed or deleted when it's not supposed to, if the code has vulnerabilities.
Some of these possible attacks, particularly those that involve cross-origin interaction, are automatically blocked by the browser. But browsers can't reliably distinguish suspicious from valid code, which means that we'll explicitly have to allow those interactions if we want them to happen.

User avatar
Cristobal
Posts: 49
Joined: Fri Nov 15, 2019 6:58 pm

Re: Web security and measures to protect your site

Post by Cristobal » Sat Nov 23, 2019 7:04 pm

Very good, thanks

Andreu Botella
Site Admin
Posts: 11
Joined: Sun Nov 10, 2019 12:36 am

Re: Web security and measures to protect your site

Post by Andreu Botella » Sun Nov 24, 2019 1:57 am

XSS (cross-site scripting)
(covered 22 November 2019)

If you have a form on your site, save the form contents in the database as is, and then print them out as is somewhere else in the site, your site might be vulnerable to cross-site scripting.

Let's say that you're coding the next Amazon, and you show reviews in the product page, where you also have the "buy now" button. Here's a naive implementation:

Code: Select all

function Main()
    local post
    if AP_Method() == "POST"
        post := AP_PostPairs()
        // Add new reviews to the database.
        if hb_HHasKey(post, "review")
            reviews->(FieldPut(1, hb_HGet(post, "review")))
        endif
    endif
    
    // Show stuff here...
    
    showReviews()
    
    // Show more stuff here...
return nil

function showReviews()
    local comments := '<div class="reviewSection">' + CRLF
    
    reviews->dbGoTop()
    while (!reviews->Eof())
        TEXT INTO comments
            <div class="review">{{review->(FieldGet(1))}}</div>
        ENDTEXT
    end
    
    TEXT INTO comments
        <form action="...">
            <p>Post your review!</p>
            <textarea name="review"></textarea>
            <input type="submit" value="Share">
        </form>
    ENDTEXT
    
    ?? comments
return nil
You might expect whatever the user entered to be displayed the way they entered it. However, what if they entered some HTML?

Code: Select all

Let's make some text <b>bold</b>!
The browser can't tell between genuine HTML code emitted by your application and code submitted by an attacker and posted without escaping, so when it finds the <b> tag it will treat it as an HTML tag, not as text. But you might think that showing different styles is not really a matter of web security. True... except that you can also add scripts.

Code: Select all

<script>
    // Find the "buy" button and click it.
    document.getElementById("buyButton").click()
</script>
Now as soon as the browser loads the HTML corresponding to that review, the script will run, causing every user that checks out the current product to buy it.

By the way, any input from the user or from their browser that's being displayed should be handled. That is usually only form contents, but if HTTP headers, for example, are being displayed, they should be escaped as well.

Note that frameworks should handle XSS, and any slips in their handling of XSS are serious bugs. But, even when those aspects are handled by some other tool, it's important that you have some notions about it.

Escaping plain text

If the user-provided content is meant to be plain text (that is, without any formatting), the easiest way to solve this issue is to escape everything. Just like all C-like languages allow you to add a double quote inside a (double-quoted) string by escaping it with a backslash (\"), HTML has "entity references" which can be used to have special symbols displayed and not parsed as part of the HTML syntax.
  • <&lt; (for "lower than")
  • >&gt; (for "greater than"). This escape is included for completeness but isn't needed, see below.
  • "&quot; (for "quotation mark").
  • '&apos; (for "apostrophe").
  • &&amp; (for "ampersand'). You should escape ampersands before any other character, in order to not misescape < as &amp;lt;.
You can see the effects of escaping here.

You can use all of those escapes in one big escape function, like this:

Code: Select all

function HtmlEscape(string)
	string := StrTran(string, "&", "&amp;")
	string := StrTran(string, "<", "&lt;")
	string := StrTran(string, '"', "&quot;")
	string := StrTran(string, "'", "&apos;")
return string
But if you care a lot about saving a few bytes, you might want to distinguish whether the text is being output as part of an HTML attribute (<img src="image.png" alt="{{HtmlEscape(text)}}">) or not. If output inside an attribute, the only important escapes are for the relevant quote character and &. If output elsewhere, the only important escapes are for < and &. The escape for > often appears in the lists of important escapes, but the truth is HTML parsers don't do anything special with it when found outside of an HTML tag, so it's okay not to escape it.

Watch out!: HTML allows you to have unquoted attributes, such as <img src=image.png>, but those should never be used with any content from user input, the database or even local variables (just in case). This is because the replacements listed above won't fully escape such attributes, and it's far easier to surround the value in quotes than to derive a working set of escapes.

Corrections and clarifications: In the 22 November call, Charly pointed out that the escaping could be done before saving the text into the database and you wouldn't have to worry about it when displaying it afterwards. Although I didn't realize it at the time, this is not a best practice, because at some point in the future you might need to use that input when rendering in a format other than HTML, such as JSON. It's best to escape it, as HTML or as JSON, when rendering the data in such formats.

Allowing rich text

So far we've seen how to escape all HTML, so as to output the text as it was entered into the form. However, you might want to allow HTML in order to control formatting, while blocking any script embedded into it. The naive approach would be to detect any <script> tags, but there are many other ways in the HTML platform to run Javascript. Here are a few:
  • <a href="javascript:alert('Hi!')">. Runs when clicked.
  • <b onclick="alert('Hi!')">. The HTML attributes that start with "on" run when the corresponding event fires on that element. The click event, as you might expect, fires when the element is clicked. There are many other events such as onmouseover (when the mouse is moved over the element)...
  • <img src="image.png" onload="alert('Hi!')">. The load event, present in <img> and other similar elements, will fire as soon as the image loads, no input from the user needed.
  • Etc...
In order to learn more about the many ways you can sneak scripts into a page, take a look at OWASP's XSS filter evasion cheat list.

Rather than allow any input HTML except for a few blacklisted patterns, it's better to have a white list of HTML elements and attributes that are allowed, and escape any other element and remove any other attribute.

Embedding data from the DB into scripts

Corrections and clarifications: I didn't talk about this issue in the 22 November call because I didn't realize it could be an issue, since I'm used to working with design patterns where outputting server-generated script code is strongly discouraged.

You might want to alert the user of, say, a private message they got since the last time they visited. You could do this:

Code: Select all

if privateMessage <> nil
    TEXT INTO output
        <script>
            alert("You have a new private message!\n\n{{privateMessage}}");
        </script>
    ENDTEXT
endif
The Javascript alert function takes plaintext, so there's no need to escape HTML. The issue here is rather escaping the JS string. Since JS strings have the exact same syntax as JSON strings, you might think that hb_jsonEncode(privateMessage) is enough.

The problem is that the browser's HTML parser doesn't understand Javascript, and it only hands off the script to the JS parser once it's found the end </script> tag. Which means that privateMessage could be </script> <script>document.getElementById('buyButton').click();. The first <script> element would fail because the script ends before the string is closed, but when a script fails browsers simply skip the remainder of the current script and move on. Any double quotes in the text of the second script would be escaped, but here it's using single quotes. And only after the click() command runs, the JS parser would find "); to be invalid code and stop running.

Here's how you should escape:

Code: Select all

TEXT INTO output
    <!-- Plain text in scripts. Note that we escape "</script" rather than "</script>" because you
         can have spaces before the closing bracket. -->
    <script>
        alert({{StrTran(hb_jsonEncode(text), '</script', '</scr" + "ipt')}});
    </script>
    
    <!-- If the script string needs to be parsed by HTML, first escape the HTML and then the string.
         The HTML escapes already handle the "</script". -->
    <script>
        someElement.innerHTML = {{hb_jsonEncode(HtmlEscape(text))}};
    </script>
ENDTEXT
Fallbacks

While you should try and escape everything, you might miss something. In those cases, something like CSP (content security policy) is a life-saver, since you can limit the list of things scripts can do on your site to only those you intend to use. That way, even if a script manages to slip through, browsers will block those usages and optionally send a report warning you.

When not to escape

Are there any situations where you shouldn't escape? Well, there aren't many, but one such example is w3schools.com's interactive HTML editor, that will output anything you type into it as is so you can test HTML directly from the browser. But the reason why its safe for them to do that is because no actions are available from that page that depend on your user account.
Last edited by Andreu Botella on Thu Nov 28, 2019 9:25 am, edited 1 time in total.

User avatar
Cristobal
Posts: 49
Joined: Fri Nov 15, 2019 6:58 pm

Re: Web security and measures to protect your site

Post by Cristobal » Sun Nov 24, 2019 6:51 pm

Andreu, very very good, thanks

Post Reply