By transparently sitting between client and server on each request, MySQL Proxy offers many possibilities for query manipulation.
Many are explored in the cookbook, and they even include a histogram recipe. Still, I wanted to learn more about the proxy while working on a script that would let me get some stats on the queries executed against a server (or group of servers).
First things first, get a brief glimpse of the lua programming language since that's what the proxy's scripts are written in. Alternatively, you can jump straight into the sample scripts, extrapolate what you don't understand of the syntax by making paralelizations against other known scripting languages and make the best of it. That's what I've been doing so far :)
We'll, now on to it.
Here's my super simple proxy script. It consists of a global variable and a line that spits out all the vars I want, separated by '||||'. I choose that separator since it's unlikely to happen in a real query and hence it won't cause me to loose much data while doing load data infile later. That's my scientific approach, and considering this aren't medical records, it's good enough for me.
I can get away with using a global var because the proxy fires up a new instance of the lua script for every new client connection. At least that's what I've been able to find out so far, and my empirical data has confirmed this. If source code inspection later rejects this finding, I'll have to find a better (probably more complex) way to achieve the same goal.
query = "" function read_query( packet ) if packet:byte() == proxy.COM_QUERY then query = packet:sub(2) proxy.queries:append(1, packet ) return proxy.PROXY_SEND_QUERY end end function read_query_result(inj) print(os.date('%Y-%m-%d %H:%M:%S') .. "||||" .. query .. "||||" .. (in.query_time / 1000) .. "||||" .. (in.response_time / 1000)) end
That simple script saves the query into the global variable, from the read_query hook function, and prints the results in the read_query_results function. Notice how this hooks provide for much more possibilities if you're a skillful hacker and an evil one too :) (i.e., man in the middle type of things, there are a few query modification examples in the cookbook)
Ok, so with this part covered, we need to run the proxy, and then run some queries against it.
Both things are easy:
$ mysql-proxy --proxy-lua-script trace.lua &> trace.log
and something like
$ mysql -someuser -psomepassword -proxyhost -P4040
Notice that 4040 is the default port for the proxy, but you could change it into 3306, and move mysql into another port.
This generates a trace.log file that looks like this:
2009-04-06 07:13:03||||select count(*) from City||||0.381||||0.404 2009-04-06 07:13:06||||desc City||||2.08||||2.18 2009-04-06 07:13:20||||select * from City where Population between 200 and 2000||||169.122||||194.083 2009-04-06 07:13:22||||select * from City where Population between 200 and 2000||||0.408||||9.16 2009-04-06 07:13:23||||select * from City where Population between 200 and 2000||||0.664||||8.455
We need to load that into mysql.
I created a script just for that purpose:
#!/bin/bash tcp trace.log /tmp cd=$(date "+%Y%m%d%H%M%S") echo "You'll be asked for MySQL root user's password" mysql -root -p <<EOSCR set @@sql_mode=ANSI; create database if not exists "__perf"; use __perf; create table if not exists "analysis_results_${cd}"( id int unsigned not null auto_increment, ts datetime, query char(200), query_time float, response_time float, primary key ( id ), key ( ts ), key ( query(100) ), fulltext ( query ), key ( query_time ), key ( response_time ) ) Engine=MyISAM ROW_FORMAT=Fixed; load data infile '/tmp/trace.log' into table "analysis_results_${cd}" fields terminated by '||||' (ts,query,query_time,response_time); EOSCR resultMySQL=$? rm -f /tmp/trace.log [ $resultMySQL -eq 0 ] && echo "Data imported OK">&2 || echo "Error while importing data, please refer to the output of MySQL">&2
So, once you've run quite a few queries against the proxy, you could do something like this:
$ ./loadTrace.sh You'll be asked for MySQL root user's password Enter password: Data imported OK
And then run some queries against the analysis tables.
Here are some ideas:
Top 10 queries that took more time to process:
mysql> select * from analysis_results_20090406073836 order by query_time desc limit 10;
Top 10 queries that took more time to return to the client:
mysql> select * from analysis_results_20090406073836 order by response_time desc limit 10;
Top 10 queries, ordered by their text, and then the time it took them to get back. You can infer, by the query issue time, if the query cache was in use, and then, if it was useful. You'll be surprised that for large datasets, while the load is taken off the server by using the cache, the client doesn't perceive such a big improvement because it still takes a lot of time for the resultset to go back. Therefore he/she will still complain. Sometimes a lot of effort is put into optimizing server performance, and the way this server is accessed is totally neglected!
mysql> select * from analysis_results_20090406073836 order by query, response_time desc limit 10;
Well, there you go. Have fun and find out what kind of usage your application is giving to your server.