Ongoing MySQL myths

There’s an interesting post over at Olery‘s blog about a successful migration story from using MySQL/MongoDB to PostgreSQL as the persistence layer for applications that, however, lists a couple of cons of using MySQL that I personally think are no longer valid complaints (or at least not as big as they used to be). I did not find a way to reply to the post so, hopefully for the benefit of people who are new to MySQL and may hear the same recycled myths, I am doing so here.

Now I am aware that PostgreSQL has feature MySQL lacks (bitmap indexes are a personal favourite), but still, it is frustrating that instead of using actual arguments consisting of actual use cases in which one database is better than another, these old stories get recycled. I think it’s a disservice to both MySQL and PostgreSQL.

The first paragraph that caught my attention is:

MySQL was the first candidate as we were already using it for some small chunks of critical data. MySQL however is not without its problems. For example, when defining a field as int(11) you can just happily insert textual data and MySQL will try to convert it.

The sql_mode variable is available since at least 4.0 (GA in 2004 if I’m not mistaken, which is over a decade ago) and while yes, the default value is permissive, and any evil programmer will be able to change the value for the variable (Roland made very good suggestions about this some time ago), in my experience it has provided people with enough protection against this kind of data integrity problems.

And here’s the second one:

Another problem with MySQL is that any table modification (e.g. adding a column) will result in the table being locked for both reading and writing. This means that any operation using such a table will have to wait until the modification has completed.

Innodb supports some online DDL operations starting with MySQL 5.6 (GA in February 2013, over two years ago) and of course, there’s always pt-online-schema-change, which can usually help in the cases were online DDL is still not supported, or when the impact on replica lag makes that approach unfeasible.

To wrap up, my message is that it’s a big world out there, with different technologies that all have cons and pros. You need to make your due diligence and investigate the biggest of each when evaluating one technology, or comparing a few of them, to make sure you pick the best tool for the job (and for your context, because tools are not context-free. A kitchen knife is not a very good screwdriver, but a few times in my life I found myself in the situation of using it, and with success!), but most importantly, please base your decision on actually experience-verified facts, and not on things you’ve heard others say, or that perhaps you experienced a long time ago.

 

Benchmarking Joomla

This post recently caught my attention on Planet MySQL.

If you haven’t read it yet, I suggest that you go and do so, and also read the comments.

I think Matthew’s request for the queries so that others can run comparative benchmarks is very interesting, and while I don’t have access to the queries used to produce those results, I am making some queries of my own available for others to test. You can get your own queries if you want, just enable slow query logging with long_query_time set to 0 on the database for a Joomla instalation (being unexperienced in Joomla I used this project, which is great) and then run these steps on the resulting slow query log file:


csplit -z joomlatools-slow.log '/^#/' '{*}'
for f in x*; do grep -i select $f >/dev/null || rm -f $f; done
for f in x*; do grep -v '^SET' $f > $$ && mv -f $$ $f; done
for f in x*; do grep -v '^#' $f > $$ && mv -f $$ $f; done
for f in x*; do tr '\n' ' ' <$f>$$ && echo >> $$ && mv -f $$ $f; done
cat x* > queries.txt

That should give you a file similar to this one.
The EXPLAIN statements are not mine, everything you see on that file was generated by me visiting a sample Joomla site.

I do not have the time to run a big benchmark now, but a mysqlslap using the exact same settings as those from the original post, and my list of queries, seems simple enough.

The vagrant box for Joomla has MySQL 5.5 so I used that for my tests. Here are the results:

MySQL 5.5 with the default configuration (default as installed by the debian package):

root@joomlatools:~# mysqlslap --concurrency=100 --iterations=10 --query=queries.txt --create-schema=myuser_joomla -umyuser_joom -p7654e684c4d14ab544261568101cc9c4
Benchmark
Average number of seconds to run all queries: 1.911 seconds
Minimum number of seconds to run all queries: 1.786 seconds
Maximum number of seconds to run all queries: 2.202 seconds
Number of clients running queries: 100
Average number of queries per client: 265

MariaDB 10.1, with the config as installed by the MariaDB package:

root@joomlatools:~# mysqlslap --concurrency=100 --iterations=10 --query=queries.txt --create-schema=myuser_joomla -umyuser_joom -p7654e684c4d14ab544261568101cc9c4
Benchmark
Average number of seconds to run all queries: 1.743 seconds
Minimum number of seconds to run all queries: 1.694 seconds
Maximum number of seconds to run all queries: 1.847 seconds
Number of clients running queries: 100
Average number of queries per client: 265

MySQL 5.5 with MariaDB’s configuration (minus incompatible options)

root@joomlatools:~# mysqlslap --concurrency=100 --iterations=10 --query=queries.txt --create-schema=myuser_joomla -umyuser_joom -p7654e684c4d14ab544261568101cc9c4
Benchmark
Average number of seconds to run all queries: 1.803 seconds
Minimum number of seconds to run all queries: 1.548 seconds
Maximum number of seconds to run all queries: 2.398 seconds
Number of clients running queries: 100
Average number of queries per client: 265

Interesting. MariaDB still performs better than 5.5, but performance improved for this old version anyway.
I took a quick look at the configuration file and among the changes, use of the query cache caught my attention. I think it’s strange to have it enabled by default since it can cause so much contention problems at heavy concurrency workloads, but I’m not familiar with the MariaDB improvements and perhaps the query cache implementation on it is better at scaling.

Just to check, I tried MariaDB 10.1 with the query cache disabled:

mariadb 10.1 without the query cache:
root@joomlatools:~# mysqlslap --concurrency=100 --iterations=10 --query=queries.txt --create-schema=myuser_joomla -umyuser_joom -p7654e684c4d14ab544261568101cc9c4
Benchmark
Average number of seconds to run all queries: 4.113 seconds
Minimum number of seconds to run all queries: 3.735 seconds
Maximum number of seconds to run all queries: 5.446 seconds
Number of clients running queries: 100
Average number of queries per client: 265

Interesting. It seems without the query cache, MariaDB 10.1 performs worse than MySQL 5.5. Could be a concurrency thing though, as I know more recent MySQL versions (and I assume the same is true for MariaDB) have worked on optimising for high concurrency use cases, at the expense of low concurrency or single threaded workloads. And while the mysqlslap invocation runs 100 threads, don’t be fooled. I ran this on a VM on a notebook with 2 cores with hyperthreading. It’s great as a workstation but it wouldn’t cut it as a server :)

I’d like to see what experience others have. You know exactly the VM and workload I used for this tests, so let’s see what numbers you get.