You know that strange thing that when you have to explicitly handle output filtering, things will inevitably go wrong? Well I do.
You see, I've been using
Smarty as my templating buddy for some time now but once in a while I get bitten in the ass by me not handling output filtering and thus having HTML injection 'holes' in my applications.
The problem starts with the fact that I have to 'work' to get filtering included. What I actually want is filtering to be 'on by default', so that I can shut it off when needed. That way the default is 'safe' (there are always other way to get compromised of course) except in those cases that I explicitly turn the filtering off.
I want Smarty to do my filtering! "But Smarty already does this!" I hear you say, and I'd have to agree. Except Smarty does this by providing the "escape" modifier. But I don't want to write '|escape:"html"' after every bloody variable I use in my templates. I want Smarty to filter the little buggers by default.
I know what you're thinking. "No no no, you've got it all wrong. You should handle filtering at the logic layer and not at the presentation layer". Well.. then you'd be wrong. Since HTML (and Javascript) injection is only present in the presentation layer I want my presentation layer to handle the filtering. I can't really be bothered at the logic layer, since the problem does not exists there. I do, however, need to filter the input at the logic part of my code, but thats another story.
So what needed to be done was, reverse the way you handle escaping in Smarty. This means variables need to be escaped by default and can be 'unescaped' using the newly crafted 'unescape' modifier.
Ok, so how exacly does one handle this. Well, Smarty has a hand full of usefull API hooks with which you can 'plug' into the compiling process. I chose to use a 'postfilter' to process the template vars and encode them before they end up in the template.
You could also use a pre-filter since they both do their stuff
after the all the variables are assigned and before the output is actually shown. I chose postfilter because this is the last possible point in the compilation process before the variables are forged into the template and pushed out to whatever device is doing the request.
So how does this look, code-wise:
// create the smarty object
$smarty = new Smarty();
// register the template and compile dir
$smarty->template_dir = 'templates';
$smarty->compile_dir = 'templates_c';
/**
This is a modified htmlspecialchars function, created to
encode values in arrays as well. We want all strings to be
encoded.
/
function htmlspecialchars_recursive($var, $quote_style) {
foreach($var as $key => $val) {
$ret[$key] = htmlspecialchars_recursive($val, $quote_style);
}
} else {
$ret = $var;
}
return $ret;
}
/**
This is the actual postfilter function. It loops through the tpl_vars
and uses the custom htmlspecialchars_recursive function to excode
the vars.
/
function inoculate($tpl_output, &$smarty) {
if(!
empty($smarty->_tpl_vars
)) { foreach($smarty->_tpl_vars as $key => $value) {
$smarty->_tpl_vars[$key] = htmlspecialchars_recursive($value, ENT_QUOTES);
}
}
return $tpl_output;
}
/**
The actual unescape modifier. Used to revert HTML encoding when needed.
NOTE: the htmlspecialchars_decode is available for PHP 5.1 and higher.
So this won't work on any PHP4 systems.
/
function unescape($string) {
return htmlspecialchars_decode($string, ENT_QUOTES);
}
// register the modifier
$smarty->register_modifier('unescape', 'unescape');
// register the postfilter
$smarty->register_postfilter('inoculate');
Ok so now that the filters are in place, we need to test the stuff.
The php file:
/**
.. snip .. (all of the code above)
/
// assign some test vars.
$smarty->assign('str1', '<h1>Hello!</h1>');
$smarty->assign('str2', '<script type="text/javascript">alert(document.cookie);</script>');
$smarty->
assign('list1',
array( "<h2>One</h2>",
"<script>alert('two');</script>",
"<script>document.write('three');</script>",
"fo\"'''\"ur", "five"
));
$smarty->
assign('list2',
array( 'c' => '<script>alert("burried");</script>'
)
)
));
// force smarty to compile on every hit. This is for testing purposes only,
// turn this off on production systems. It will slow you down. A lot.
$smarty->force_compile = true;
// Display the testing template.
$smarty->display('output_filter.tpl');
The template file:
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
<head>
<title>Output filter test</title>
</head>
<body>
{$str1}
<br/>
{$str2}
<br/>
{$str1|unescape}
<ul>
{foreach from=$list1 item=item}
<li>{$item}</li>
{/foreach}
</ul>
<br/>
{$list2.a.b.c}
<br/>
{$list2.a.b.c|unescape}
</body>
</html>
And voila, the output is nicely escaped and safe by default. Oh the joy of carelessness. I can't really show you a demo since I'm too lazy to set one up, so you need to test this for yourself.
Using this has some obvious advantages and some disadvantages. Unicode support, for instance, is an issue. Though htmlspecialchars can handle charset conversion quite nicely, it has to be provided. So you
have to know the exact charset you are sending out for this to function like it should. It defaults to ISO-8859-1 so the not-so-internationalized applications can use this whithout worries.
Another thing which might be annoying is that all the HTML you use in variables will inadvertally be escaped. One place where we've had this problem is when dealing with language files. In some places we used HTML inside the language strings, these got escaped as well. Whoops!
Also performance could be slow when handling large sets of data. So you would need to do some benchmarking before putting stuff like this in production.
UPDATE: After reading
some discussion on this very problem I've noticed another issue with my implementation.
Doing this:
{if $var eq 'foo'}
// stuff
{/if}
Is doing a comparison on an encoded value and could possibly go wrong. Then again, this will probably not happen as long as you're not comparing to any 'weird' variables which contain quotes or other unholy characters. WHO DOES THAT ANYWAY.
Also my implementation does not function in sync with the current 'escape' functionality using options like html, htmlall, url etc for describing the type of escaping to be used.
A possible solution for the use of escaped vars in expressions would be changing the post-filter a bit. Instead of looping through all the template vars we could preg_replace the echo code in the actual compiled template. Personally I think that would be quite a gross hack. Not even thinking about the implications when using other modifiers. I like my first implementation better even though it requires an extra function and even though its not complete.
The stuff I read on the smarty forum gives some me hope that Smarty is going to have output escaping functionality of its own. Which is good and saves me the time hacking 'nice-yet-not-quite-right' solutions to these problems.