MySQL-Proxy learning to block SQL-Injection

August 15th, 2008 | by Stefan Esser |

I previously reported about my joy with MySQL-Proxy and a simple SQL-Injection detection based on a simple heuristic.

Today I present the more interesting approach that I promised to publish after my webinar yesterday. This approach is based on the idea that SQL queries issued by an application always have a certain structure. This structure can be learned and remembered by MySQL-Proxy. Any SQL query that has a different structure can then be considered an attack.

Training Mode

The first Lua script learn_sql_queries.lua uses MySQL-Proxy’s read_query hook to catch COM_INIT_DB and COM_QUERY packets. COM_INIT_DB packets are issued when the database is changed and COM_QUERY packets contain normal queries.

When a change of database is detected a CREATE TABLE is injected into the communication to create a table called ‘allowed_queries’ in the newly selected database. This table consist of only on column called ‘query’. Within this column normalized queries are collected.

When a normal query is received it is first tokenized by MySQL-Proxy’s tokenizer. The tokens are then used to recreate a normalized version of the query where all data values are replaced by the ‘?’ placeholder. Additionally IN ( ?, ?, ?, …) statements are compressed to IN ( ? ) to allow arbitrary length IN value lists without having to learn all possibilities. The normalized query is then learned by inserting it into the table.

When all queries have been learned (maybe during development) the blocking mode can be started.

Blocking Mode

The second Lua script block_unknown_queries.lua also uses MySQL-Proxy’s read_query hook to catch COM_INIT_DB and COM_QUERY packets.

When a change of database is detected a SELECT statement is injected into the communication that loads the table ‘allowed_queries’ into a Lua-Table. The queries become the indices so that they can be found fast.

When a normal query is received it is first tokenized and normalized. The normalized Query is then searched in the Lua-Table which is just a key lookup. If the query is found in the table it is one of the known query structures that are allowed. The query is then executed as normally.

If the query is not found in the table it is either a query that was not learned by mistake or it is an SQL-Injection attack. The query is not executed and a database error “Possible SQL Injection” is returned.

Both proof of concept examples are released as GPL. Therefore feel free to modify them for your needs. You might prefer to just log SQL-Injection attempts instead of blocking them.

  1. 16 Responses to “MySQL-Proxy learning to block SQL-Injection”

  2. By kuza55 on Aug 15, 2008 | Reply

    I’m not sure if you’ve seen it but this seems a lot like GreenSQL: http://www.greensql.net/

    GreenSQL acts more like an IPS than what you have here, but I think it does have the option to block all queries that aren’t of the format it has in the database, though I’m not sure if it has a learning mode, or if you simply have to set it to log, and then manually approve all the SQL query formats before switching it to block. In any case, I thought you might be interested.

  3. By Stefan Esser on Aug 15, 2008 | Reply

    Hi kuza55,

    I am quite sure that there are several different implementations of SQL query learning. This was just me playing around with the powers of MySQL-Proxy.

    I am still waiting for the MySQL-Proxy authors to release their query tokenizer BSD licensed. Then I will add the same feature to Suhosin.

  4. By Dave on Aug 15, 2008 | Reply

    Aren’t you just reinventing prepared statements here?

  5. By Stefan Esser on Aug 15, 2008 | Reply

    Dave this approach has nothing todo with prepared statements.

    Prepared statements are a programming technique used to prepare repeatedly occuring SQL queries for faster execution. It is a side effect that your application gets more secure when you only use prepared statements. It is however an urban legend that using prepared statements protects you completely from SQL-Injection.

    There are plenty of applications using prepared statements that are still vulnerable to SQL-Injection.

    The approach used here on the other hand is not a programming technique but a way to detect unknown query types (which are most probably the result of SQL-Injection). This approach is a simple way to make arbitrary applications more secure without touching a single line of code.

    BTW: I never claimed to have invented this approach. Learning based IDS/IPS are nothing new. This blogpost is about adding this featureset to MySQL-Proxy.

  6. By kuza55 on Aug 15, 2008 | Reply

    Is there a particular reason you’re using MySQL-Proxy’s tokenizer as opposed to another projects? (I’m just curious here)

    Also, you say there are still plenty of apps vulnerable to sql injection when they use prepared statements; would you be able to elaborate on that? I have seen some which are vulnerable because they use sql-injection vulnerable sql functions (I can’t quite remember what they’re called, but essentially the SQL functions just concatenated the parameters they got an executed the query as a string), but they’ve been pretty rare in my experience.

    P.S. I wasn’t claiming that you were claiming anything, I just thought you might be interested (seriously, no antagonistic words have been written on this blog by me…yet…:p)

  7. By Stefan Esser on Aug 15, 2008 | Reply

    kuza55, you misunderstood me ;)

    Well I want to use MySQL-Proxy’s tokenizer because it is a MySQL project and therefore the possibility that MySQL-Proxy’s tokenizer and the MySQL Server’s tokenizer are related. The more related they are the less differences there are in tokenizing and understanding of queries.

    I have seen code like this in the past (lots of times):

    $stmt = ‘SELECT * FROM u WHERE id=? ORDER BY username ‘.$_GET['dir'];

    This is caused by the fact that placeholders in SQL queries only exist for values. However things like ORDER BY, fieldnames and also IN () statements are often “dynamically” added to the prepared statement from user data.

    And of course the moment you do this you have the same problem all over again…

  8. By kuza55 on Aug 15, 2008 | Reply

    Ah, cool, thanks for the info (good to know it’s the same stuff as from MySQL; makes trusting it to be accurate much easier)

    And thanks for clarifying the SQL Injection stuff; I thought I was missing out on something huge there for a while. Though I must say that surprises me given how simple prepared statements are to use (in PHP anyway)…

  9. By Stefan Esser on Aug 15, 2008 | Reply

    Nah… It is not necessary the same tokenizer. But I trust people of MySQL to get a copy of their tokenizer right more than non MySQL people.

    However that is the old problem: you need to have the same tokenizer/parser everywhere because otherwise you will have problems.

    Same would be needed for all the other databases :(

  10. By Wouter on Aug 18, 2008 | Reply

    Quote “… prepared statements. It is however an urban legend that using prepared statements protects you completely from SQL-Injection.”

    Could you please enumerate on this?

    I was always believed prepared statements eliminate SQL injection attacks completely, thus removing the need for further checking of SQL statements.

  11. By Stefan Esser on Aug 18, 2008 | Reply

    @Wouter: I already answere kuza55 the same question

    The problem is that some constructs used in web applications cannot be handled by prepared statements.

    For example: order by statements, IN parameter lists

    When a web application needs these things developers start to suddenly create their prepared statements dynamically. For example they append the search direction (from user input) to the end of the prepared statement.

    $stmt = “SELECT * FROM u WHERE registered>? and blocked=? ORDER BY username ” + DIRECTION_FROM_USERINPUT

    Of course this is stupid and bad style, but I have seen code like this too often. Many developers also believe that SQL injection after ORDER BY is not possible…

    The problem here is that developers often do not use prepared statements, because they fear sql injection, but because someone told them that they should use prepared statements and that this would be more secure.

    So these developers don’t know what SQL injection is and create the same problems all over again.

  12. By Wouter on Aug 18, 2008 | Reply

    Quote “@Wouter: I already answere kuza55 the same question”

    My fault, I should have read better first :-)

    Quote “… example they append the search direction (from user input) to the end of the prepared statement.”

    So if I understand correctly them problem really isn’t the use of prepared statements as a measure against SQL injection, but more the limitations of its implementation.

    In the framework we use it isn’t possible to use non-verified user data (from i.e $_GET, $_POST), unverified data always returns NULL. This in combination with prepared statements should make injection attacks a whole lot more difficult.

    Thanks for clearing this up!

  13. By Stefan Esser on Aug 18, 2008 | Reply

    @Wouter:

    You are right. My comment targets the limitations that are unfortunately seldomly documented.

    The problem with prepared statements and security is that all of those security guides tell to embrace prepared statements. It is a popular believe that SQL-Injection suddenly is impossible just because your developers have the power of prepared statement.

    I have seen people claiming: When you audit us, you can skip the whole SQL-Injection thing. We use prepared statements. Therefore there cannot be an SQL-Injection. And of course those guys created a prepared statement with a user supplied ORDER BY statement.

    Therefore I strongly believe developers must understand the concept of SQL-Injection, the concept of escaping and the concept of prepared statements and what their limitations are. Just using prepared statements without knowing why doesn’t solve the problem. It just decreases the attack surface.

  14. By sibirya on Sep 10, 2008 | Reply

    thanka a lot

  15. By shesek on Sep 21, 2008 | Reply

    I actually tried (and started) to write it myself yesterday.
    while googling for a (better) way to tokenize the queries, I came across this post. It seems like you did a good job, and I’m pretty sure that I can never build a tokenizer as good as they did. Excellent idea, and great implention.
    thanks!

  16. By shesek on Sep 21, 2008 | Reply

    Forgot to ask you a few things - do you have any idea about handling dynamically created queries, where the structure of the query changes on-the-fly, or.. lets say, an column for ORDER BY can be changed?

  1. 1 Trackback(s)

  2. Aug 16, 2008: PHP.ph » MySQL-Proxy Learning to Block SQL-Injection

Post a Comment