Tag Archives: MySQL-Proxy

Updated mysql-proxy benchmarking script (for proxy 0.7)

My previous post contained a lua script for MySQL proxy that would generate benchmarking information. However, just days (or maybe hours?) after I published it, release 0.7 of mysql-proxy was published, making my script obsolete.

I’ve fixed this (it needed just a minor tweak), so here‘s a tarball with the following:

  • trace.lua, which is the lua script to use with mysql-proxy
  • loadTrace.sh, which is a shell script that will load the output of mysql-proxy into the analysis tables. This creates the __perf database and the proper analysis table if needed. THIS LOOKS FOR trace.log IN THE CURRENT DIRECTORY. yeah yeah, it could take the name as an argument, but this is not intended as a package, it’s just an example for you to follow.

Remember to save the output of mysql-proxy in order to load it later with loadTrace.sh (i.e., run something like mysql-proxy –proxy-lua-script=/path/to/trace.lua > trace.log and then loadTrace.sh

Using MySQL Proxy to benchmark query performance

By transparently sitting between client and server on each request, MySQL Proxy offers many possibilities for query manipulation.

Many are explored in the cookbook, and they even include a histogram recipe. Still, I wanted to learn more about the proxy while working on a script that would let me get some stats on the queries executed against a server (or group of servers).

First things first, get a brief glimpse of the lua programming language since that’s what the proxy’s scripts are written in. Alternatively, you can jump straight into the sample scripts, extrapolate what you don’t understand of the syntax by making paralelizations against other known scripting languages and make the best of it. That’s what I’ve been doing so far :)

We’ll, now on to it.

Here’s my super simple proxy script. It consists of a global variable and a line that spits out all the vars I want, separated by ‘||||’. I choose that separator since it’s unlikely to happen in a real query and hence it won’t cause me to loose much data while doing load data infile later. That’s my scientific approach, and considering this aren’t medical records, it’s good enough for me.

I can get away with using a global var because the proxy fires up a new instance of the lua script for every new client connection. At least that’s what I’ve been able to find out so far, and my empirical data has confirmed this. If source code inspection later rejects this finding, I’ll have to find a better (probably more complex) way to achieve the same goal.

query = ""

function read_query( packet )
	if packet:byte() == proxy.COM_QUERY then
		query = packet:sub(2)
		proxy.queries:append(1, packet )
		return proxy.PROXY_SEND_QUERY
	end
end

function read_query_result(inj)
        print(os.date('%Y-%m-%d %H:%M:%S') .. "||||" ..  query .. "||||" .. (in.query_time / 1000) .. "||||" .. (in.response_time / 1000))
end

That simple script saves the query into the global variable, from the read_query hook function, and prints the results in the read_query_results function. Notice how this hooks provide for much more possibilities if you’re a skillful hacker and an evil one too :) (i.e., man in the middle type of things, there are a few query modification examples in the cookbook)

Ok, so with this part covered, we need to run the proxy, and then run some queries against it.

Both things are easy:

$ mysql-proxy --proxy-lua-script trace.lua &> trace.log

and something like

$ mysql -someuser -psomepassword -proxyhost -P4040

Notice that 4040 is the default port for the proxy, but you could change it into 3306, and move mysql into another port.

This generates a trace.log file that looks like this:

2009-04-06 07:13:03||||select count(*) from City||||0.381||||0.404
2009-04-06 07:13:06||||desc City||||2.08||||2.18
2009-04-06 07:13:20||||select * from City where Population between 200 and 2000||||169.122||||194.083
2009-04-06 07:13:22||||select * from City where Population between 200 and 2000||||0.408||||9.16
2009-04-06 07:13:23||||select * from City where Population between 200 and 2000||||0.664||||8.455

We need to load that into mysql.

I created a script just for that purpose:

#!/bin/bash

tcp trace.log /tmp

cd=$(date "+%Y%m%d%H%M%S")

echo "You'll be asked for MySQL root user's password"

mysql -root -p <<EOSCR
set @@sql_mode=ANSI;
create database if not exists "__perf";
use __perf;

create table if not exists "analysis_results_${cd}"(
        id int unsigned not null auto_increment,
        ts datetime,
        query char(200),
        query_time float,
        response_time float,
        primary key ( id ),
        key ( ts ),
        key ( query(100) ),
        fulltext ( query ),
        key ( query_time ),
        key ( response_time )
) Engine=MyISAM ROW_FORMAT=Fixed;

load data infile '/tmp/trace.log' into table "analysis_results_${cd}" fields terminated by '||||' (ts,query,query_time,response_time);

EOSCR

resultMySQL=$?

rm -f /tmp/trace.log

[ $resultMySQL -eq 0 ] && echo "Data imported OK">&2 || echo "Error while importing data, please refer to the output of MySQL">&2

So, once you’ve run quite a few queries against the proxy, you could do something like this:

$ ./loadTrace.sh
You'll be asked for MySQL root user's password
Enter password:
Data imported OK

And then run some queries against the analysis tables.

Here are some ideas:

Top 10 queries that took more time to process:

mysql> select * from analysis_results_20090406073836 order by query_time desc limit 10;

Top 10 queries that took more time to return to the client:

mysql> select * from analysis_results_20090406073836 order by response_time desc limit 10;

Top 10 queries, ordered by their text, and then the time it took them to get back. You can infer, by the query issue time, if the query cache was in use, and then, if it was useful. You’ll be surprised that for large datasets, while the load is taken off the server by using the cache, the client doesn’t perceive such a big improvement because it still takes a lot of time for the resultset to go back. Therefore he/she will still complain. Sometimes a lot of effort is put into optimizing server performance, and the way this server is accessed is totally neglected!

mysql> select * from analysis_results_20090406073836 order by query, response_time desc limit 10;

Well, there you go. Have fun and find out what kind of usage your application is giving to your server.