how to prevent explain from executing subqueries

Here’s a quick tip for using explain:

You may know this already, but mysql will actually execute some subqueries when you invoke explain.  Here’s an example:

mysql> explain select id from projects where id = (select max(id) from projects where name like 'en%');
+----+-------------+----------+-------+---------------+---------+---------+-------+-------+-------------+
| id | select_type | table    | type  | possible_keys | key     | key_len | ref   | rows  | Extra       |
+----+-------------+----------+-------+---------------+---------+---------+-------+-------+-------------+
|  1 | PRIMARY     | projects | const | PRIMARY       | PRIMARY | 4       | const |     1 | Using index |
|  2 | SUBQUERY    | projects | ALL   | NULL          | NULL    | NULL    | NULL  | 67922 | Using where |
+----+-------------+----------+-------+---------------+---------+---------+-------+-------+-------------+
2 rows in set (0.11 sec)

Take a look at the execution time (I choose an intentionally poorly executing query for my little dataset).
Here’s explain when it’s not executing:

mysql> explain select max(id) from projects where name like 'en%';
+----+-------------+----------+------+---------------+------+---------+------+-------+-------------+
| id | select_type | table    | type | possible_keys | key  | key_len | ref  | rows  | Extra       |
+----+-------------+----------+------+---------------+------+---------+------+-------+-------------+
|  1 | SIMPLE      | projects | ALL  | NULL          | NULL | NULL    | NULL | 69513 | Using where |
+----+-------------+----------+------+---------------+------+---------+------+-------+-------------+
1 row in set (0.00 sec)

If you want to work around this to prevent trouble on a production server (albeit, not getting the output from explain), you can do this:

mysql> set session max_join_size=1;
Query OK, 0 rows affected (0.00 sec)

mysql> explain select id from projects where id = (select max(id) from projects where name like 'en%');
ERROR 1104 (42000): The SELECT would examine more than MAX_JOIN_SIZE rows; check your WHERE and use SET SQL_BIG_SELECTS=1 or SET SQL_MAX_JOIN_SIZE=# if the SELECT is okay
Share and Enjoy: These icons link to social bookmarking sites where readers can share and discover new web pages.
  • MisterWong
  • Y!GG
  • Webnews
  • Digg
  • del.icio.us
  • StumbleUpon
  • Reddit
  • email
  • Facebook
  • LinkedIn
  • Technorati

2 thoughts on “how to prevent explain from executing subqueries

  1. Hm, what if there is a user-defined function (UDF) which may have side effects?

    explain select id from projects where id = (select my_udf(max(id)) from projects where name like ‘en%’);

    Would this EXPLAIN then actually call my_udf()? Seems a bit scary (but maybe one should expect that?)

  2. Kristian:

    I’ve poked around the source before to try and figure things out, but code for handling explain seems to be a bit distributed on the code base, so it takes more than just a few minutes reading to understand it.

    So that leaves me with my second best guess: strace.

    Take a look at this:

    (I’m using lookup from udf_example.so)

    mysql> select lookup(url) from urls;
    +—————+
    | lookup(url) |
    +—————+
    | 69.80.205.238 |
    | 69.89.31.108 |
    | 69.80.205.238 |
    +—————+
    3 rows in set (0.09 sec)

    And here’s the strace relevant for the lookup:

    [pid 8205] 0.000120 write(3, “100127 16:40:23\t 9 Query select lookup(url) from urls\n”…, 65) = 65

    [pid 8205] 0.000878 open(“/etc/resolv.conf”, O_RDONLY) = 37
    [pid 8205] 0.000179 fstat(37, {st_mode=S_IFREG|0644, st_size=240, …}) = 0

    Now with explain:

    mysql> explain select lookup(url) from urls;
    +—-+————-+——-+——+—————+——+———+——+——+——-+
    | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
    +—-+————-+——-+——+—————+——+———+——+——+——-+
    | 1 | SIMPLE | urls | ALL | NULL | NULL | NULL | NULL | 3 | |
    +—-+————-+——-+——+—————+——+———+——+——+——-+
    1 row in set (0.00 sec)

    [pid 8205] 0.000137 write(3, “100127 16:40:35\t 9 Query explain select lookup(url) from urls\n”…, 73) = 73
    [pid 8205] 0.000185 sched_setscheduler(8205, SCHED_OTHER, { 6 }) = -1 EINVAL (Invalid argument)
    [pid 8205] 0.000280 gettimeofday({1264617635, 884763}, NULL) = 0
    [pid 8205] 0.000091 gettimeofday({1264617635, 884852}, NULL) = 0
    [pid 8205] 0.000266 write(29, “\x01\x00\x00\x01\x0a\x18\x00\x00\x02\x03\x64\x65\x66\x00\x00\x00\x02\x69\x64\x00\x0c\x3f\x00\x03\x00\x00\x00\x08\xa1\x00
    \x00\x00\x00\x21\x00\x00\x03\x03\x64\x65\x66\x00\x00\x00\x0b\x73\x65\x6c\x65\x63\x74\x5f\x74\x79\x70\x65\x00\x0c\x08\x00\x13\x00\x00\x00\xfd\x01\x00\x1f\x00\
    x00\x1b\x00\x00\x04\x03\x64\x65\x66\x00\x00\x00\x05\x74\x61\x62\x6c\x65\x00\x0c\x08\x00\x40\x00\x00\x00\xfd\x00\x00\x1f\x00\x00\x1a\x00\x00\x05\x03\x64\x65\x
    66\x00\x00\x00\x04\x74\x79\x70\x65\x00\x0c\x08\x00\x0a\x00\x00\x00\xfd\x00\x00\x1f\x00\x00\x23\x00\x00\x06\x03\x64\x65\x66\x00\x00\x00\x0d\x70\x6f\x73\x73\x6
    9\x62\x6c\x65\x5f\x6b\x65\x79\x73\x00\x0c\x08\x00\x00\x10\x00\x00\xfd\x00\x00\x1f\x00\x00\x19\x00\x00\x07\x03\x64\x65\x66\x00\x00\x00\x03\x6b\x65\x79\x00\x0c
    \x08\x00\x40\x00\x00\x00\xfd\x00\x00\x1f\x00\x00\x1d\x00\x00\x08\x03\x64\x65\x66\x00\x00\x00\x07\x6b\x65\x79\x5f\x6c\x65\x6e\x00\x0c\x08\x00\x00\x10\x00\x00\
    xfd\x00\x00\x1f\x00\x00\x19\x00\x00\x09\x03\x64\x65\x66\x00\x00\x00\x03\x72\x65\x66\x00\x0c\x08\x00\x00\x04\x00\x00\xfd\x00\x00\x1f\x00\x00\x1a\x00\x00\x0a\x
    03\x64\x65\x66\x00\x00\x00\x04\x72\x6f\x77\x73\x00\x0c\x3f\x00\x0a\x00\x00\x00\x08\xa0\x00\x00\x00\x00\x1b\x00\x00\x0b\x03\x64\x65\x66\x00\x00\x00\x05\x45\x7
    8\x74\x72\x61\x00\x0c\x08\x00\xff\x00\x00\x00\xfd\x01\x00\x1f\x00\x00\x05\x00\x00\x0c\xfe\x00\x00\x02\x00\x19\x00\x00\x0d\x01\x31\x06\x53\x49\x4d\x50\x4c\x45
    \x04\x75\x72\x6c\x73\x03\x41\x4c\x4c\xfb\xfb\xfb\xfb\x01\x33\x00\x05\x00\x00\x0e\xfe\x00\x00\x02\x00″…, 369) = 369
    [pid 8205] 0.000457 gettimeofday({1264617635, 885575}, NULL) = 0
    [pid 8205] 0.000092 sched_setscheduler(8205, SCHED_OTHER, { 8 }) = -1 EINVAL (Invalid argument)
    [pid 8205] 0.000103 gettimeofday({1264617635, 885770}, NULL) = 0
    [pid 8205] 0.000084 gettimeofday({1264617635, 885853}, NULL) = 0
    [pid 8205] 0.000086 gettimeofday({1264617635, 885939}, NULL) = 0
    [pid 8205] 0.000116 gettimeofday({1264617635, 886058}, NULL) = 0
    [pid 8205] 0.000130 write(4, “# Time: 100127 16:40:35\n# User@Host: root[root] @ localhost []\n# Thread_id: 9 Schema: test\n# Query_time: 0.001985 Lo
    ck_time: 0.000690 Rows_sent: 1 Rows_examined: 0 Rows_affected: 0 Rows_read: 1\nexplain select lookup(url) from urls;\n”…, 238) = 238

    To discard a possible dns caching effect, I ran the original query twice on the same connection, and got the same strace output, so it’s always going to /etc/resolv.conf

    Still, without looking at the source code, this is just anecdotical evidence from one example, I’ll try to look more into this and perhaps make a new post on the subject :)

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>