<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Fernando Ipar &#187; performance</title>
	<atom:link href="http://fernandoipar.com/tag/performance/feed/" rel="self" type="application/rss+xml" />
	<link>http://fernandoipar.com</link>
	<description>I love mankind! Its people I can&#039;t stand!</description>
	<lastBuildDate>Tue, 06 Dec 2011 01:01:21 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=abc</generator>
		<item>
		<title>Indexing text columns in MySQL</title>
		<link>http://fernandoipar.com/2009/08/12/indexing-text-columns-in-mysql/</link>
		<comments>http://fernandoipar.com/2009/08/12/indexing-text-columns-in-mysql/#comments</comments>
		<pubDate>Wed, 12 Aug 2009 22:07:28 +0000</pubDate>
		<dc:creator>fernando</dc:creator>
				<category><![CDATA[MySQL]]></category>
		<category><![CDATA[indexes]]></category>
		<category><![CDATA[optimization]]></category>
		<category><![CDATA[performance]]></category>

		<guid isPermaLink="false">http://fernandoipar.com/?p=178</guid>
		<description><![CDATA[This time, I&#8217;m talking about indexes for string typed columns. In particular, I&#8217;ll show a procedure I find useful while looking for good index length values for these columns. I&#8217;ll use a sample table called people. Here&#8217;s what it looks like: mysql&#62; desc people; +&#8212;&#8212;&#8212;&#8212;+&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;+&#8212;&#8212;+&#8212;&#8211;+&#8212;&#8212;&#8212;+&#8212;&#8212;&#8212;&#8212;&#8212;-+ &#124; Field &#124; Type &#124; Null &#124; Key &#124; Default [...]


Related posts:<ol><li><a href='http://fernandoipar.com/2009/04/06/using-mysql-proxy-to-benchmark-query-performance/' rel='bookmark' title='Permanent Link: Using MySQL Proxy to benchmark query performance'>Using MySQL Proxy to benchmark query performance</a> <small>By transparently sitting between client and server on each request,...</small></li><li><a href='http://fernandoipar.com/2009/04/18/making-use-of-procedure-analyse/' rel='bookmark' title='Permanent Link: Making use of procedure analyse()'>Making use of procedure analyse()</a> <small>SELECT Field0[,Field1,Field2,...] FROM TABLE PROCEDURE ANALYSE() is a nice tool...</small></li><li><a href='http://fernandoipar.com/2010/01/26/how-to-prevent-explain-from-executing-subqueries/' rel='bookmark' title='Permanent Link: how to prevent explain from executing subqueries'>how to prevent explain from executing subqueries</a> <small>Here&#8217;s a quick tip for using explain: You may know...</small></li></ol>

Related posts brought to you by <a href='http://mitcho.com/code/yarpp/'>Yet Another Related Posts Plugin</a>.]]></description>
			<content:encoded><![CDATA[<p><P>This time, I&#8217;m talking about indexes for string typed columns. In particular, I&#8217;ll show a procedure I find useful while looking for good index length values for these columns.<br />
</P><br />
<P>I&#8217;ll use a sample table called people.<br />
</P><br />
<P>Here&#8217;s what it looks like:<br />
</P><br />
<PRE>mysql&gt; desc people;
+&#8212;&#8212;&#8212;&#8212;+&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;+&#8212;&#8212;+&#8212;&#8211;+&#8212;&#8212;&#8212;+&#8212;&#8212;&#8212;&#8212;&#8212;-+
| Field      | Type             | Null | Key | Default | Extra          |
+&#8212;&#8212;&#8212;&#8212;+&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;+&#8212;&#8212;+&#8212;&#8211;+&#8212;&#8212;&#8212;+&#8212;&#8212;&#8212;&#8212;&#8212;-+
| id         | int(11) unsigned | NO   | PRI | NULL    | auto_increment |
| title      | varchar(250)     | NO   |     | NULL    |                |
| city       | varchar(250)     | NO   |     | NULL    |                |
| occupation | varchar(250)     | NO   |     | NULL    |                |
+&#8212;&#8212;&#8212;&#8212;+&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;+&#8212;&#8212;+&#8212;&#8211;+&#8212;&#8212;&#8212;+&#8212;&#8212;&#8212;&#8212;&#8212;-+
4 rows in set (0.00 sec)

mysql&gt; select count(*) from people;
+&#8212;&#8212;&#8212;-+
| count(*) |
+&#8212;&#8212;&#8212;-+
|   150000 |
+&#8212;&#8212;&#8212;-+
1 row in set (0.00 sec)

mysql&gt; </pre><P><br />
We&#8217;ll start by using procedure analyse to get some useful information<br />
about our data. Unless you know some fields are good candidates for<br />
use with the ENUM datatype, invoke procedure analyse with arguments<br />
(0,0) in order to prevent mysql from suggesting huge ENUMs for string<br />
columns.<br />
</P><br />
<PRE>mysql&gt; select * from people procedure analyse(0,0)\G
*************************** 1. row ***************************
             Field_name: test.people.id
              Min_value: 1
              Max_value: 150000
             Min_length: 1
             Max_length: 6
       Empties_or_zeros: 0
                  Nulls: 0
Avg_value_or_avg_length: 75000.5000
                    Std: 87258.1632
      Optimal_fieldtype: MEDIUMINT(6) UNSIGNED NOT NULL
*************************** 2. row ***************************
             Field_name: test.people.title
              Min_value: aback exclaims stopgap&#8217;s chapel&#8217;s tanked claps snowshoe cigarette correlates extras laster cluc
              Max_value: Zulus colossally dictate cleft&#8217;s enchanter del
             Min_length: 40
             Max_length: 150
       Empties_or_zeros: 0
                  Nulls: 0
Avg_value_or_avg_length: 95.0869
                    Std: NULL
      Optimal_fieldtype: TINYTEXT NOT NULL
*************************** 3. row ***************************
             Field_name: test.people.city
              Min_value: aback ascertaining unw
              Max_value: Zulus imprisonments veiner a
             Min_length: 5
             Max_length: 30
       Empties_or_zeros: 0
                  Nulls: 0
Avg_value_or_avg_length: 17.4861
                    Std: NULL
      Optimal_fieldtype: TINYTEXT NOT NULL
*************************** 4. row ***************************
             Field_name: test.people.occupation
              Min_value:
              Max_value:
             Min_length: 0
             Max_length: 0
       Empties_or_zeros: 150000
                  Nulls: 0
Avg_value_or_avg_length: 0.0000
                    Std: NULL
      Optimal_fieldtype: CHAR(0) NOT NULL
4 rows in set (0.19 sec)

mysql&gt; </pre><P><br />
The id column is numeric, and is actually already indexed. We can&#8217;t<br />
see this right here (though I showed an example of how you can use<br />
the output of procedure analyse and extend it to suit your needs, in<br />
which I did include an &#8216;Indexed&#8217; column in the output), but we can<br />
see this, and gan more information from the table, with the following<br />
statement:<br />
</P><br />
<PRE>mysql&gt; show index from people;
+&#8212;&#8212;&#8211;+&#8212;&#8212;&#8212;&#8212;+&#8212;&#8212;&#8212;-+&#8212;&#8212;&#8212;&#8212;&#8211;+&#8212;&#8212;&#8212;&#8212;-+&#8212;&#8212;&#8212;&#8211;+&#8212;&#8212;&#8212;&#8212;-+&#8212;&#8212;&#8212;-+&#8212;&#8212;&#8211;+&#8212;&#8212;+&#8212;&#8212;&#8212;&#8212;+&#8212;&#8212;&#8212;+
| Table  | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment |
+&#8212;&#8212;&#8211;+&#8212;&#8212;&#8212;&#8212;+&#8212;&#8212;&#8212;-+&#8212;&#8212;&#8212;&#8212;&#8211;+&#8212;&#8212;&#8212;&#8212;-+&#8212;&#8212;&#8212;&#8211;+&#8212;&#8212;&#8212;&#8212;-+&#8212;&#8212;&#8212;-+&#8212;&#8212;&#8211;+&#8212;&#8212;+&#8212;&#8212;&#8212;&#8212;+&#8212;&#8212;&#8212;+
| people |          0 | PRIMARY  |            1 | id          | A         |      150000 |     NULL | NULL   |      | BTREE      |         |
+&#8212;&#8212;&#8211;+&#8212;&#8212;&#8212;&#8212;+&#8212;&#8212;&#8212;-+&#8212;&#8212;&#8212;&#8212;&#8211;+&#8212;&#8212;&#8212;&#8212;-+&#8212;&#8212;&#8212;&#8211;+&#8212;&#8212;&#8212;&#8212;-+&#8212;&#8212;&#8212;-+&#8212;&#8212;&#8211;+&#8212;&#8212;+&#8212;&#8212;&#8212;&#8212;+&#8212;&#8212;&#8212;+
1 row in set (0.00 sec)

mysql&gt;
</pre><P><br />
In order to create a good index for title, we can use the following query. I used 95 as a starting point, since it&#8217;s reported by procedure analyse as the average data length of this column.<br />
</P></p>
<pre>
mysql> select count(distinct(substr(title,1,95))) / count(distinct(title)) * 100 from people;
+--------------------------------------------------------------------+
| count(distinct(substr(title,1,95))) / count(distinct(title)) * 100 |
+--------------------------------------------------------------------+
|                                                           100.0000 |
+--------------------------------------------------------------------+
1 row in set (1.42 sec)

mysql> 
</pre>
<p><P><br />
As you can see, with 95 chars, we can get an index that covers 100% of the rows (i.e., get distinct values for all of them). Still, it&#8217;s a big number. Using this query, we<br />
can begin to play a little bit with the index size, until we get to a good compromise between enough distinct values and an index that&#8217;s small enough to be processed fast and<br />
maybe even loaded into memory.<br />
</P></p>
<pre>
mysql> select count(distinct(substr(title,1,20))) / count(distinct(title)) * 100 from people;
+--------------------------------------------------------------------+
| count(distinct(substr(title,1,20))) / count(distinct(title)) * 100 |
+--------------------------------------------------------------------+
|                                                            99.9507 |
+--------------------------------------------------------------------+
1 row in set (1.15 sec)

mysql>
</pre>
<p><P><br />
As it turns out, my data set doesn&#8217;t require too many characters in order to be differentiated. Actually, I&#8217;m kind of cheating here, for a table this size, since I used a data generator to populate this tables, and it<br />
generated a lot of random text. Real world data would probably require a larger prefix in order to get such good differentiation. Anyway, let&#8217;s push it a little bit more.<br />
</P></p>
<pre>
mysql> select count(distinct(substr(title,1,15))) / count(distinct(title)) * 100 from people;
+--------------------------------------------------------------------+
| count(distinct(substr(title,1,15))) / count(distinct(title)) * 100 |
+--------------------------------------------------------------------+
|                                                            97.0787 |
+--------------------------------------------------------------------+
1 row in set (1.18 sec)

mysql> select count(distinct(substr(title,1,14))) / count(distinct(title)) * 100 from people;
+--------------------------------------------------------------------+
| count(distinct(substr(title,1,14))) / count(distinct(title)) * 100 |
+--------------------------------------------------------------------+
|                                                            94.4247 |
+--------------------------------------------------------------------+
1 row in set (1.13 sec)
</pre>
<p><P><br />
Here&#8217;s the turning point for me. A jumb between 94% and 97% of index coverage in just 1 character. So I&#8217;m sticking with 15. Let&#8217;s test this with some queries.<br />
</P></p>
<pre>
mysql> select title from people limit 40;
+----------------------------------------------------------------------------------------------------------------------------------------------------+
| title                                                                                                                                              |
+----------------------------------------------------------------------------------------------------------------------------------------------------+
| puffs war's bruises buckles attainably Warnock's discoverer degeneration plots admirably assimilates germane burlesquely ri                        |
| arbitrariness MacDraw's carbonates suckers budget chronicler cur drabs untested Aryans imperial                                                    |
| commender dozes distills blackbird's mend meta                                                                                                     |
| gallons haying occupation's sculpt fittingness scores onwards recessed masculineness denominator's regulated boyfriend's                           |
| authored metaphor derivatively matchmakers ratification railing advantageousness flossing twin's barbarously infinite retreat alloying tenting t   |
| Africans determinateness enquired quivers replaces nowhere applicability negative alarms lacquerer shivered arachnid ulcer sil                     |
| filter offerings unboundedness clearness enthusiast commandants blunted betide rusticated blacks helmet's squabbles tasked Beethoven contro        |
| thirties oftener tunnel anguish attainable formulat                                                                                                |
| grotesquely fallacious inessential fain sanctioned too amplifi                                                                                     |
| consort rapes deeply marker patterns compacted plumbe                                                                                              |
| nasally combings searcher's pathname's bolts retrospective aroused squintingly boyish singers recompiles Austral                                   |
| purpled draggingly nobody's luckier spinning goddess oscilloscopes aimer                                                                           |
| aphasia reconverts shams entangle placer metaphysical visited turret nai                                                                           |
| violation bituminous unweighed darkness cackles consonant foully fisted loci relishes burn m                                                       |
| unsuffixed overdose humbles corpses fashions slashingly quietude delighte                                                                          |
| sheller hypocrisy falser productions shied cube breed childishness requested pads redoub                                                           |
| broils aorta refund sinker cankering reawakens portrayed resolving bard's stand ejects inhabitant's tittering genders proposition                  |
| cyclone's glorification unrestricted delicately inhibitive waterway wardrobes excommunicated laugher                                               |
| poppies heroine's gunner swollen reticle vertebrate's shrank unreliabilities infractions pretentious angstroms relations highness feasibil         |
| ampoule clustering intermediaries honer ree                                                                                                        |
| creature's transferals tidal unsigned stitching ought coerces visa girdling porn janitors parer song's croaked ta                                  |
| hammer amalgamating stunting feasibility hopefulness oilier spraying frets pinks                                                                   |
| comelier tomorrow's cowboy chalked lewdness cordial supering rut's neurally blindingly mute drowsiest gives in                                     |
| slides aqueduct glazers abolition dangers sultry raid prominence hedges walks toppled defenders autocrat                                           |
| theoretic thumps scum's photos bootlegged enveloper sallying populations disruptions inaugurate conclu                                             |
| annotated bibliographies lichen user's bluebird's subproofs unendurably recollection's crumple                                                     |
| sergeant outlets pinion reducer wiling impinge apes insaneness dose automatics lighthouse's cursory sleepily web's interruptions superin           |
| tautened skylarks toad's seminar's archangel's sarcasm shipwrecks indeed incliner tying waterf                                                     |
| chaotic censuses intimacy custodian's extendedly womb's safeguarding desire                                                                        |
| abusiveness skippered inspirer enunciation taper memory's clearly guardianship inputed m                                                           |
| firing anaphora subsegments turbulence affectedness refractory unsprayed chapter's volumes undramati                                               |
| chef reception's glens budged budge arson assistance disagreeableness fodder garnering boated skater heroine's pamphle                             |
| prophetic spilling asper petter's constable's classic ices teethes mails office's sordidness cylindered chaffing bivouac skeptics shuttering quash |
| hoppers iciest sharer dietitian dictionaries frac                                                                                                  |
| racketeer Ellen amounts origin's abstractions render vanish pantries retrieve Maxtor unprojected antithes                                          |
| hinter arrangers dialogue imputing droppers shelver boyish demonstrator braving submitting operated carbonate protruding creasing prospecto        |
| electrify garment linked discernible transceiver's ungrounded telegrapher uncoated                                                                 |
| badly unaffected vex taming affiliation strings detracts grandpas girded cafeteria approving ideology froze underlinings assass                    |
| MHz articulate draws transpires rubbling swarthier reeler bagged snug assisted consonant's settl                                                   |
| taxi's wearying espies Anglican's intangibly fluent jugs liveth pride ex                                                                           |
+----------------------------------------------------------------------------------------------------------------------------------------------------+
40 rows in set (0.00 sec)

mysql>
</pre>
<p><P><br />
That should give you an idea of what type of needles we&#8217;re looking for <img src='http://fernandoipar.com/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' /><br />
</P></p>
<pre>
mysql> select * from people where title like 'taxi%';
+--------+-----------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------+------------+
| id     | title                                                                                                                                               | city                           | occupation |
+--------+-----------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------+------------+
|     40 | taxi's wearying espies Anglican's intangibly fluent jugs liveth pride ex                                                                            | rotation tri                   |            |
|   5736 | taxi's allegorically accounting manipulatory cautiousness computational promoter wool reproo                                                        | islan                          |            |
|   6967 | taxis sprawls unblushing rude put absorbs reproducibilities crumblier kid                                                                           | DeMorgan overhe                |            |
|   8388 | taxi catsup ornament transformer widener syndicates dismount pop t                                                                                  | exhibition's manages hedgehog' |            |
|  10418 | taxingly eligibility whichever meditation corrosion unluckiness intoxicat                                                                           | gagged politeness looser       |            |
|  13091 | taxi endowment watchfulness battalions stay trickle tangle blowfish maid's transmissions questionnaire vomit saner strokers constituent crab's      | populations window's zoo armie |            |
|  16723 | taxi storing couldest bouts allegoric cluttered steeples fives hitchhike thrashes retirement de                                                     | envisaged maintain             |            |
|  22168 | taxicab voicer controllers removing cellular houses router nourishing edict shrines strikes testicle's destine whale russeted certi                 | masturbates pu                 |            |
|  24244 | taxing sleeve consultant's nonprogrammable twine delayer ingot respecter subex                                                                      | prematurely significant        |            |
|  35595 | taxicabs aback spinal checkers germs overdraft's coon critter's patrician fled coalition massaging paced condemning impen                           | oppre                          |            |
|  37105 | taxis articulatory indulgence bystanders skin burgess starlight calendaring aunt's bilging benightedness smallest softened xiv immerser fresher unn | crudely papally r              |            |
|  40871 | taxicabs muzzling precocious resentment fellers pitiers beasts marines baselines diagrammatically clowning connecters stampedin                     | influencer                     |            |
|  44298 | taxi rages unintelligibleness anastomosis orthogonally incompatibilities keypads hoarse province stamping perceived sh                              | unforgiving quiet              |            |
|  46395 | taxi plunders novelty's downstairs newborn symbiotic climax highlights lounger keypads only schools possibilities                                   | flowing forgeries slende       |            |
|  66078 | taxicab mercilessly excesses ships merchandising patch strobe                                                                                       | armfuls firmament hum coop     |            |
|  71095 | taxing dispense regrettably resuspended kilobits downwardly domestically laps rainiest recapitulates despiser trophies chums a                      | enumerate indoctrinat          |            |
|  77668 | taxicabs approachable disqualifying charcoaled script's o                                                                                           | kited publish disburse anarch  |            |
|  84162 | taxicab's captivity dean eyeball uninspiring pawn's complication outcast's stared sneak s                                                           | impracticable dungeon crop     |            |
|  87930 | taxi swiftly repacks unsupported slice mornings squares gland solar brainier harrying wag                                                           | cowslip halter plastics        |            |
|  91282 | taxiing undetected cast commands clasping germina                                                                                                   | waxes her                      |            |
| 104029 | taxied roofs besetting leadership electrocuted input metaphor bubbler vowing sponges assess                                                         | worthing understated bark      |            |
| 105818 | taxis libretti defensively shoes antagonistically heavier endeared accidental gauging intercourse revolte                                           | runne                          |            |
| 106163 | taxis bonfire's bench stereo preventer boringness blot's quieter acronyms transplant gained implores ba                                             | sighting leased sp             |            |
| 122471 | taxicab's they've berries invader touching bumblingly courtier's boosting undisguised destroy amanuensis bangles digestiveness poppy's hulls        | purity professional unski      |            |
| 123931 | taxicab's headgear Popek ratifying tenured Pascal's subduedly quitting earned planter forgave implicated noo                                        | bibliographies fraill          |            |
| 127383 | taxi choir parameter's busted inspiration's fixated blinking complicator outwit plotters gobbles burningly leafed corruptively                      | radioed size telegr            |            |
| 134211 | taxi's reconstructible indirect agglutination awaken eked unoccupied pillager subcomputation interviewing treader commending i                      | muddiness broom's              |            |
| 145784 | taxicabs sanctuary armful battening terrifying impactors guns exchequer reigns laughter desolater s                                                 | buttonhole's isomorphism       |            |
| 146371 | taxied carnivals giver misconceptions countenancer introduced anchovy exile pipelines weaned unabridged Britishly abyss's extenuating moodiness th  | penetrator upsho               |            |
+--------+-----------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------+------------+
29 rows in set (0.11 sec)

mysql> explain select * from people where title like 'taxi%';
+----+-------------+--------+------+---------------+------+---------+------+--------+-------------+
| id | select_type | table  | type | possible_keys | key  | key_len | ref  | rows   | Extra       |
+----+-------------+--------+------+---------------+------+---------+------+--------+-------------+
|  1 | SIMPLE      | people | ALL  | NULL          | NULL | NULL    | NULL | 150000 | Using where |
+----+-------------+--------+------+---------------+------+---------+------+--------+-------------+

mysql> select * from people where title = 'hoppers iciest sharer dietitian dictionaries frac';
+----+---------------------------------------------------+--------------------+------------+
| id | title                                             | city               | occupation |
+----+---------------------------------------------------+--------------------+------------+
| 34 | hoppers iciest sharer dietitian dictionaries frac | coroneted revolve  |            |
+----+---------------------------------------------------+--------------------+------------+
1 row in set (0.12 sec)

mysql> explain select * from people where title = 'hoppers iciest sharer dietitian dictionaries frac';
+----+-------------+--------+------+---------------+------+---------+------+--------+-------------+
| id | select_type | table  | type | possible_keys | key  | key_len | ref  | rows   | Extra       |
+----+-------------+--------+------+---------------+------+---------+------+--------+-------------+
|  1 | SIMPLE      | people | ALL  | NULL          | NULL | NULL    | NULL | 150000 | Using where |
+----+-------------+--------+------+---------------+------+---------+------+--------+-------------+
1 row in set (0.00 sec)
</pre>
<p>
Ok, now let&#8217;s create the index. I&#8217;m loading it into a cache here, which is not necessary (even less given my size of 150000 tuples), but it helps. In order to do this, all the indexes in your table must have the same block size.
</p>
<pre>
mysql> create index idx_people_title on people(title(15));
Query OK, 150000 rows affected (1.60 sec)
Records: 150000  Duplicates: 0  Warnings: 0

mysql> reset query cache;
Query OK, 0 rows affected (0.00 sec)

mysql> load index into cache people;
+-------------+--------------+----------+----------+
| Table       | Op           | Msg_type | Msg_text |
+-------------+--------------+----------+----------+
| test.people | preload_keys | status   | OK       |
+-------------+--------------+----------+----------+
1 row in set (0.00 sec)
</pre>
<p>
Let&#8217;s re test the queries:
</p>
<pre>

mysql> select * from people where title like 'taxi%';
+--------+-----------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------+------------+
| id     | title                                                                                                                                               | city                           | occupation |
+--------+-----------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------+------------+
|   8388 | taxi catsup ornament transformer widener syndicates dismount pop t                                                                                  | exhibition's manages hedgehog' |            |
| 127383 | taxi choir parameter's busted inspiration's fixated blinking complicator outwit plotters gobbles burningly leafed corruptively                      | radioed size telegr            |            |
|  13091 | taxi endowment watchfulness battalions stay trickle tangle blowfish maid's transmissions questionnaire vomit saner strokers constituent crab's      | populations window's zoo armie |            |
|  46395 | taxi plunders novelty's downstairs newborn symbiotic climax highlights lounger keypads only schools possibilities                                   | flowing forgeries slende       |            |
|  44298 | taxi rages unintelligibleness anastomosis orthogonally incompatibilities keypads hoarse province stamping perceived sh                              | unforgiving quiet              |            |
|  16723 | taxi storing couldest bouts allegoric cluttered steeples fives hitchhike thrashes retirement de                                                     | envisaged maintain             |            |
|  87930 | taxi swiftly repacks unsupported slice mornings squares gland solar brainier harrying wag                                                           | cowslip halter plastics        |            |
|   5736 | taxi's allegorically accounting manipulatory cautiousness computational promoter wool reproo                                                        | islan                          |            |
| 134211 | taxi's reconstructible indirect agglutination awaken eked unoccupied pillager subcomputation interviewing treader commending i                      | muddiness broom's              |            |
|     40 | taxi's wearying espies Anglican's intangibly fluent jugs liveth pride ex                                                                            | rotation tri                   |            |
|  66078 | taxicab mercilessly excesses ships merchandising patch strobe                                                                                       | armfuls firmament hum coop     |            |
|  22168 | taxicab voicer controllers removing cellular houses router nourishing edict shrines strikes testicle's destine whale russeted certi                 | masturbates pu                 |            |
|  84162 | taxicab's captivity dean eyeball uninspiring pawn's complication outcast's stared sneak s                                                           | impracticable dungeon crop     |            |
| 123931 | taxicab's headgear Popek ratifying tenured Pascal's subduedly quitting earned planter forgave implicated noo                                        | bibliographies fraill          |            |
| 122471 | taxicab's they've berries invader touching bumblingly courtier's boosting undisguised destroy amanuensis bangles digestiveness poppy's hulls        | purity professional unski      |            |
|  35595 | taxicabs aback spinal checkers germs overdraft's coon critter's patrician fled coalition massaging paced condemning impen                           | oppre                          |            |
|  77668 | taxicabs approachable disqualifying charcoaled script's o                                                                                           | kited publish disburse anarch  |            |
|  40871 | taxicabs muzzling precocious resentment fellers pitiers beasts marines baselines diagrammatically clowning connecters stampedin                     | influencer                     |            |
| 145784 | taxicabs sanctuary armful battening terrifying impactors guns exchequer reigns laughter desolater s                                                 | buttonhole's isomorphism       |            |
| 146371 | taxied carnivals giver misconceptions countenancer introduced anchovy exile pipelines weaned unabridged Britishly abyss's extenuating moodiness th  | penetrator upsho               |            |
| 104029 | taxied roofs besetting leadership electrocuted input metaphor bubbler vowing sponges assess                                                         | worthing understated bark      |            |
|  91282 | taxiing undetected cast commands clasping germina                                                                                                   | waxes her                      |            |
|  71095 | taxing dispense regrettably resuspended kilobits downwardly domestically laps rainiest recapitulates despiser trophies chums a                      | enumerate indoctrinat          |            |
|  24244 | taxing sleeve consultant's nonprogrammable twine delayer ingot respecter subex                                                                      | prematurely significant        |            |
|  10418 | taxingly eligibility whichever meditation corrosion unluckiness intoxicat                                                                           | gagged politeness looser       |            |
|  37105 | taxis articulatory indulgence bystanders skin burgess starlight calendaring aunt's bilging benightedness smallest softened xiv immerser fresher unn | crudely papally r              |            |
| 106163 | taxis bonfire's bench stereo preventer boringness blot's quieter acronyms transplant gained implores ba                                             | sighting leased sp             |            |
| 105818 | taxis libretti defensively shoes antagonistically heavier endeared accidental gauging intercourse revolte                                           | runne                          |            |
|   6967 | taxis sprawls unblushing rude put absorbs reproducibilities crumblier kid                                                                           | DeMorgan overhe                |            |
+--------+-----------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------+------------+
29 rows in set (0.00 sec)

mysql> explain select * from people where title like 'taxi%';
+----+-------------+--------+-------+------------------+------------------+---------+------+------+-------------+
| id | select_type | table  | type  | possible_keys    | key              | key_len | ref  | rows | Extra       |
+----+-------------+--------+-------+------------------+------------------+---------+------+------+-------------+
|  1 | SIMPLE      | people | range | idx_people_title | idx_people_title | 17      | NULL |   56 | Using where |
+----+-------------+--------+-------+------------------+------------------+---------+------+------+-------------+
1 row in set (0.00 sec)
</pre>
<p>
Notice how we only go through 56 rows now.
</p>
<pre>

mysql> select * from people where title = 'hoppers iciest sharer dietitian dictionaries frac';
+----+---------------------------------------------------+--------------------+------------+
| id | title                                             | city               | occupation |
+----+---------------------------------------------------+--------------------+------------+
| 34 | hoppers iciest sharer dietitian dictionaries frac | coroneted revolve  |            |
+----+---------------------------------------------------+--------------------+------------+
1 row in set (0.00 sec)

mysql> explain select * from people where title = 'hoppers iciest sharer dietitian dictionaries frac';
+----+-------------+--------+------+------------------+------------------+---------+-------+------+-------------+
| id | select_type | table  | type | possible_keys    | key              | key_len | ref   | rows | Extra       |
+----+-------------+--------+------+------------------+------------------+---------+-------+------+-------------+
|  1 | SIMPLE      | people | ref  | idx_people_title | idx_people_title | 17      | const |    1 | Using where |
+----+-------------+--------+------+------------------+------------------+---------+-------+------+-------------+
1 row in set (0.01 sec)
</pre>
<p>
Just 1 row.<br />
Ok, let&#8217;s test the quality of the index to find unique rows.
</p>
<pre>

mysql> explain select * from people where title = 'arbitrariness MacDraw\'s carbonates suckers budget chronicler cur drabs untested Aryans imperial';
+----+-------------+--------+------+------------------+------------------+---------+-------+------+-------------+
| id | select_type | table  | type | possible_keys    | key              | key_len | ref   | rows | Extra       |
+----+-------------+--------+------+------------------+------------------+---------+-------+------+-------------+
|  1 | SIMPLE      | people | ref  | idx_people_title | idx_people_title | 17      | const |    1 | Using where |
+----+-------------+--------+------+------------------+------------------+---------+-------+------+-------------+
1 row in set (0.00 sec)

mysql> explain select * from people where title = 'dunes delightfulness manurers jousts axer aristocrat\'s driver greediness bloke pays preconditions enclosure consideration plaster';
+----+-------------+--------+------+------------------+------------------+---------+-------+------+-------------+
| id | select_type | table  | type | possible_keys    | key              | key_len | ref   | rows | Extra       |
+----+-------------+--------+------+------------------+------------------+---------+-------+------+-------------+
|  1 | SIMPLE      | people | ref  | idx_people_title | idx_people_title | 17      | const |    1 | Using where |
+----+-------------+--------+------+------------------+------------------+---------+-------+------+-------------+
1 row in set (0.00 sec)

mysql> explain select * from people where title = 'satire\'s most quacked campaigning wrists disengaging insignia woodlander knuckles despaired portending incredulous predication Sally\'s amica';
+----+-------------+--------+------+------------------+------------------+---------+-------+------+-------------+
| id | select_type | table  | type | possible_keys    | key              | key_len | ref   | rows | Extra       |
+----+-------------+--------+------+------------------+------------------+---------+-------+------+-------------+
|  1 | SIMPLE      | people | ref  | idx_people_title | idx_people_title | 17      | const |    2 | Using where |
+----+-------------+--------+------+------------------+------------------+---------+-------+------+-------------+
1 row in set (0.00 sec)

mysql> select *,length(title) from people where title = 'satire\'s most quacked campaigning wrists disengaging insignia woodlander knuckles despaired portending incredulous predication Sally\'s amica';
+-----+----------------------------------------------------------------------------------------------------------------------------------------------+------------------+------------+---------------+
| id  | title                                                                                                                                        | city             | occupation | length(title) |
+-----+----------------------------------------------------------------------------------------------------------------------------------------------+------------------+------------+---------------+
| 344 | satire's most quacked campaigning wrists disengaging insignia woodlander knuckles despaired portending incredulous predication Sally's amica | buffaloes refill |            |           140 |
+-----+----------------------------------------------------------------------------------------------------------------------------------------------+------------------+------------+---------------+
1 row in set (0.00 sec)
</pre>
<p>
Granted, 150.000 rows isn&#8217;t much, but still, with an average row data length of 94, I had to find a 140 character title in order to go through 2 rows before the right one was found. That&#8217;s reasonable, considering<br />
I estimated a 97% index coverage.
</p>
<p>
In conclusion, wihle my dataset size certainly isn&#8217;t large enough to do many interesting things, it should prove the point that a good index size will go great lengths into helping you improve the performance of your MySQL based<br />
system. The query I presented here can be useful to look for a decent index size in terms of unique rows coverage.</p>


<p>Related posts:<ol><li><a href='http://fernandoipar.com/2009/04/06/using-mysql-proxy-to-benchmark-query-performance/' rel='bookmark' title='Permanent Link: Using MySQL Proxy to benchmark query performance'>Using MySQL Proxy to benchmark query performance</a> <small>By transparently sitting between client and server on each request,...</small></li><li><a href='http://fernandoipar.com/2009/04/18/making-use-of-procedure-analyse/' rel='bookmark' title='Permanent Link: Making use of procedure analyse()'>Making use of procedure analyse()</a> <small>SELECT Field0[,Field1,Field2,...] FROM TABLE PROCEDURE ANALYSE() is a nice tool...</small></li><li><a href='http://fernandoipar.com/2010/01/26/how-to-prevent-explain-from-executing-subqueries/' rel='bookmark' title='Permanent Link: how to prevent explain from executing subqueries'>how to prevent explain from executing subqueries</a> <small>Here&#8217;s a quick tip for using explain: You may know...</small></li></ol></p>
<p>Related posts brought to you by <a href='http://mitcho.com/code/yarpp/'>Yet Another Related Posts Plugin</a>.</p>]]></content:encoded>
			<wfw:commentRss>http://fernandoipar.com/2009/08/12/indexing-text-columns-in-mysql/feed/</wfw:commentRss>
		<slash:comments>6</slash:comments>
		</item>
		<item>
		<title>Using MySQL Proxy to benchmark query performance</title>
		<link>http://fernandoipar.com/2009/04/06/using-mysql-proxy-to-benchmark-query-performance/</link>
		<comments>http://fernandoipar.com/2009/04/06/using-mysql-proxy-to-benchmark-query-performance/#comments</comments>
		<pubDate>Mon, 06 Apr 2009 11:20:28 +0000</pubDate>
		<dc:creator>fernando</dc:creator>
				<category><![CDATA[MySQL]]></category>
		<category><![CDATA[Benchmarking]]></category>
		<category><![CDATA[MySQL-Proxy]]></category>
		<category><![CDATA[performance]]></category>

		<guid isPermaLink="false">http://fernandoipar.com/?p=139</guid>
		<description><![CDATA[By transparently sitting between client and server on each request, MySQL Proxy offers many possibilities for query manipulation. Many are explored in the cookbook, and they even include a histogram recipe. Still, I wanted to learn more about the proxy while working on a script that would let me get some stats on the queries [...]


Related posts:<ol><li><a href='http://fernandoipar.com/2009/04/15/updated-mysql-proxy-benchmarking-script-for-proxy-07/' rel='bookmark' title='Permanent Link: Updated mysql-proxy benchmarking script (for proxy 0.7)'>Updated mysql-proxy benchmarking script (for proxy 0.7)</a> <small>My previous post contained a lua script for MySQL proxy...</small></li><li><a href='http://fernandoipar.com/2009/01/16/new-release-of-mysql-proxy-gpl/' rel='bookmark' title='Permanent Link: New release of MySQL Proxy GPL'>New release of MySQL Proxy GPL</a> <small>MySQL Proxy has a new release, just three days ago,...</small></li><li><a href='http://fernandoipar.com/2009/03/09/using-the-enum-data-type-to-increase-performance/' rel='bookmark' title='Permanent Link: Using the ENUM data type to increase performance'>Using the ENUM data type to increase performance</a> <small>While going through the DATA TYPES section of the Certification...</small></li></ol>

Related posts brought to you by <a href='http://mitcho.com/code/yarpp/'>Yet Another Related Posts Plugin</a>.]]></description>
			<content:encoded><![CDATA[<p>By transparently sitting between client and server on each request, MySQL Proxy offers many possibilities for query manipulation.</p>
<p>Many are explored in the <a title="MySQL Proxy Cookbook" href="forge.mysql.com/wiki/MySQL_Proxy_Cookbook">cookbook</a>, and they even include a histogram recipe. Still, I wanted to learn more about the proxy while working on a script that would let me get some stats on the queries executed against a server (or group of servers).</p>
<p>First things first, get a brief glimpse of the <a title="The lua programming language" href="www.lua.org">lua programming language</a> since that&#8217;s what the proxy&#8217;s scripts are written in. Alternatively, you can jump straight into the sample scripts, extrapolate what you don&#8217;t understand of the syntax by making paralelizations against other known scripting languages and make the best of it. That&#8217;s what I&#8217;ve been doing so far <img src='http://fernandoipar.com/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' /> </p>
<p>We&#8217;ll, now on to it.</p>
<p>Here&#8217;s my super simple proxy script. It consists of a global variable and a line that spits out all the vars I want, separated by &#8216;||||&#8217;. I choose that separator since it&#8217;s unlikely to happen in a real query and hence it won&#8217;t cause me to loose much data while doing load data infile later. That&#8217;s my scientific approach, and considering this aren&#8217;t medical records, it&#8217;s good enough for me.</p>
<p>I can get away with using a global var because the proxy fires up a new instance of the lua script for every new client connection. At least that&#8217;s what I&#8217;ve been able to find out so far, and my empirical data has confirmed this. If source code inspection later rejects this finding, I&#8217;ll have to find a better (probably more complex) way to achieve the same goal.</p>
<pre>query = ""

function read_query( packet )
	if packet:byte() == proxy.COM_QUERY then
		query = packet:sub(2)
		proxy.queries:append(1, packet )
		return proxy.PROXY_SEND_QUERY
	end
end

function read_query_result(inj)
        print(os.date('%Y-%m-%d %H:%M:%S') .. "||||" ..  query .. "||||" .. (in.query_time / 1000) .. "||||" .. (in.response_time / 1000))
end</pre>
<p>That simple script saves the query into the global variable, from the read_query hook function, and prints the results in the read_query_results function. Notice how this hooks provide for much more possibilities if you&#8217;re a skillful hacker and an evil one too <img src='http://fernandoipar.com/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' />  (i.e., man in the middle type of things, there are a few query modification examples in the cookbook)</p>
<p>Ok, so with this part covered, we need to run the proxy, and then run some queries against it.</p>
<p>Both things are easy:</p>
<pre>$ mysql-proxy --proxy-lua-script trace.lua &amp;&gt; trace.log</pre>
<p>and something like</p>
<pre>$ mysql -someuser -psomepassword -proxyhost -P4040</pre>
<p>Notice that 4040 is the default port for the proxy, but you could change it into 3306, and move mysql into another port.</p>
<p>This generates a trace.log file that looks like this:</p>
<pre>2009-04-06 07:13:03||||select count(*) from City||||0.381||||0.404
2009-04-06 07:13:06||||desc City||||2.08||||2.18
2009-04-06 07:13:20||||select * from City where Population between 200 and 2000||||169.122||||194.083
2009-04-06 07:13:22||||select * from City where Population between 200 and 2000||||0.408||||9.16
2009-04-06 07:13:23||||select * from City where Population between 200 and 2000||||0.664||||8.455</pre>
<p>We need to load that into mysql.</p>
<p>I created a script just for that purpose:</p>
<pre>#!/bin/bash

tcp trace.log /tmp

cd=$(date "+%Y%m%d%H%M%S")

echo "You'll be asked for MySQL root user's password"

mysql -root -p &lt;&lt;EOSCR
set @@sql_mode=ANSI;
create database if not exists "__perf";
use __perf;

create table if not exists "analysis_results_${cd}"(
        id int unsigned not null auto_increment,
        ts datetime,
        query char(200),
        query_time float,
        response_time float,
        primary key ( id ),
        key ( ts ),
        key ( query(100) ),
        fulltext ( query ),
        key ( query_time ),
        key ( response_time )
) Engine=MyISAM ROW_FORMAT=Fixed;

load data infile '/tmp/trace.log' into table "analysis_results_${cd}" fields terminated by '||||' (ts,query,query_time,response_time);

EOSCR

resultMySQL=$?

rm -f /tmp/trace.log

[ $resultMySQL -eq 0 ] &amp;&amp; echo "Data imported OK"&gt;&amp;2 || echo "Error while importing data, please refer to the output of MySQL"&gt;&amp;2</pre>
<p>So, once you&#8217;ve run quite a few queries against the proxy, you could do something like this:</p>
<pre>$ ./loadTrace.sh
You'll be asked for MySQL root user's password
Enter password:
Data imported OK</pre>
<p>And then run some queries against the analysis tables.</p>
<p>Here are some ideas:</p>
<p>Top 10 queries that took more time to process:</p>
<pre>mysql&gt; select * from analysis_results_20090406073836 order by query_time desc limit 10;</pre>
<p>Top 10 queries that took more time to return to the client:</p>
<pre>mysql&gt; select * from analysis_results_20090406073836 order by response_time desc limit 10;</pre>
<p>Top 10 queries, ordered by their text, and then the time it took them to get back. You can infer, by the query issue time, if the query cache was in use, and then, if it was useful. You&#8217;ll be surprised that for large datasets, while the load is taken off the server by using the cache, the client doesn&#8217;t perceive such a big improvement because it still takes a lot of time for the resultset to go back. Therefore he/she will still complain. Sometimes a lot of effort is put into optimizing server performance, and the way this server is accessed is totally neglected!</p>
<pre>mysql&gt; select * from analysis_results_20090406073836 order by query, response_time desc limit 10;</pre>
<p>Well, there you go. Have fun and find out what kind of usage your application is giving to your server.</p>


<p>Related posts:<ol><li><a href='http://fernandoipar.com/2009/04/15/updated-mysql-proxy-benchmarking-script-for-proxy-07/' rel='bookmark' title='Permanent Link: Updated mysql-proxy benchmarking script (for proxy 0.7)'>Updated mysql-proxy benchmarking script (for proxy 0.7)</a> <small>My previous post contained a lua script for MySQL proxy...</small></li><li><a href='http://fernandoipar.com/2009/01/16/new-release-of-mysql-proxy-gpl/' rel='bookmark' title='Permanent Link: New release of MySQL Proxy GPL'>New release of MySQL Proxy GPL</a> <small>MySQL Proxy has a new release, just three days ago,...</small></li><li><a href='http://fernandoipar.com/2009/03/09/using-the-enum-data-type-to-increase-performance/' rel='bookmark' title='Permanent Link: Using the ENUM data type to increase performance'>Using the ENUM data type to increase performance</a> <small>While going through the DATA TYPES section of the Certification...</small></li></ol></p>
<p>Related posts brought to you by <a href='http://mitcho.com/code/yarpp/'>Yet Another Related Posts Plugin</a>.</p>]]></content:encoded>
			<wfw:commentRss>http://fernandoipar.com/2009/04/06/using-mysql-proxy-to-benchmark-query-performance/feed/</wfw:commentRss>
		<slash:comments>5</slash:comments>
		</item>
		<item>
		<title>Using the ENUM data type to increase performance</title>
		<link>http://fernandoipar.com/2009/03/09/using-the-enum-data-type-to-increase-performance/</link>
		<comments>http://fernandoipar.com/2009/03/09/using-the-enum-data-type-to-increase-performance/#comments</comments>
		<pubDate>Mon, 09 Mar 2009 13:48:44 +0000</pubDate>
		<dc:creator>fernando</dc:creator>
				<category><![CDATA[MySQL]]></category>
		<category><![CDATA[performance]]></category>

		<guid isPermaLink="false">http://fernandoipar.com/?p=119</guid>
		<description><![CDATA[While going through the DATA TYPES section of the Certification Study Guide, I was refreshed of the ENUM datatype, which I rarely use. 

I usually create individual tables for enumerations, so that new values can be added with just an insert, or deprecated values can be marked as such. 

However, today I got to think about the performance issues involved in all that joining, and how could an ENUM column improve a select. 


Related posts:<ol><li><a href='http://fernandoipar.com/2009/04/06/using-mysql-proxy-to-benchmark-query-performance/' rel='bookmark' title='Permanent Link: Using MySQL Proxy to benchmark query performance'>Using MySQL Proxy to benchmark query performance</a> <small>By transparently sitting between client and server on each request,...</small></li><li><a href='http://fernandoipar.com/2009/08/14/generating-data-with-dbmonster/' rel='bookmark' title='Permanent Link: Generating data with dbmonster'>Generating data with dbmonster</a> <small> In my last post I included some sample data...</small></li><li><a href='http://fernandoipar.com/2010/01/26/how-to-prevent-explain-from-executing-subqueries/' rel='bookmark' title='Permanent Link: how to prevent explain from executing subqueries'>how to prevent explain from executing subqueries</a> <small>Here&#8217;s a quick tip for using explain: You may know...</small></li></ol>

Related posts brought to you by <a href='http://mitcho.com/code/yarpp/'>Yet Another Related Posts Plugin</a>.]]></description>
			<content:encoded><![CDATA[<p>While going through the DATA TYPES section of the Certification Study Guide, I was refreshed of the ENUM datatype, which I rarely use. </p>
<p>I usually create individual tables for enumerations, so that new values can be added with just an insert, or deprecated values can be marked as such. </p>
<p>However, today I got to think about the performance issues involved in all that joining, and how could an ENUM column improve a select. </p>
<p>Here&#8217;s what I came up with: </p>
<pre>
mysql> create table project_types (
    -> id int unsigned not null auto_increment,
    -> name char(30) not null,
    -> primary key (id),
    -> index(name)
    -> ) Engine InnoDB;
Query OK, 0 rows affected (0.00 sec)
</pre>
<p>create the projects table</p>
<pre>
mysql> create table projects (
    -> id int unsigned not null auto_increment,
    -> name char(30) not null,
    -> project_type int unsigned not null,
    -> primary key (id),
    -> index(name),
    -> constraint `proyects_project_type` foreign key (`project_type`) references project_types(`id`)
    -> ) Engine InnoDB;
Query OK, 0 rows affected (0.01 sec)
</pre>
<p>insert some project types</p>
<pre>
mysql> select * from project_types order by id;
+----+-------------+
| id | name        |
+----+-------------+
|  1 | Development |
|  2 | Consultancy |
|  3 | Research    |
|  4 | Support     |
+----+-------------+
4 rows in set (0.00 sec)
</pre>
<p>insert several projects. </p>
<pre>
i=0; while [ $i -lt 100000000 ]; do echo "'$RANDOM',$((($RANDOM % 4)+1))">> projects.csv; i=$((i+1)); done
</pre>
<p>I&#8217;m not patient, so I cancelled the process somewhere in the middle, then did a few cat&#8217;s of<br />
the file to get more records, and then loaded the resulting file into mysql with a simple: </p>
<pre>
mysql> load data infile '/tmp/fullprojects.csv' into table projects fields terminated by ',' (name,project_type);
</pre>
<p>Stopped it sometime after a few hours <img src='http://fernandoipar.com/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' /><br />
I ended up with 1.774.292 rows, which I hope will be enough for testing. </p>
<pre>

mysql> set profiling = 1;
Query OK, 0 rows affected (0.00 sec)

mysql> select projects.name as Project, project_types.name as Type from projects,project_types where project_types.id = projects.project_type limit 10000 into outfile '/tmp/projects.txt';
Query OK, 10000 rows affected (2.26 sec)

mysql> show profile for query 1;
+--------------------------------+----------+
| Status                         | Duration |
+--------------------------------+----------+
| (initialization)               | 0.000005 |
| checking query cache for query | 0.000135 |
| Opening tables                 | 0.000214 |
| System lock                    | 0.000016 |
| Table lock                     | 0.000024 |
| init                           | 0.000255 |
| optimizing                     | 0.000027 |
| statistics                     | 0.000057 |
| preparing                      | 0.000028 |
| executing                      | 0.000009 |
| Sending data                   | 2.263521 |
| end                            | 0.000027 |
| query end                      | 0.000014 |
| freeing items                  | 0.000025 |
| closing tables                 | 0.000021 |
| logging slow query             | 0.000126 |
+--------------------------------+----------+
16 rows in set (0.00 sec)

mysql> select projects.name as Project, project_types.name as Type from projects,project_types where project_types.id = projects.project_type limit 100000 into outfile '/tmp/projects.txt';
Query OK, 100000 rows affected (10.23 sec)

mysql> show profile for query 2;
+--------------------------------+-----------+
| Status                         | Duration  |
+--------------------------------+-----------+
| (initialization)               | 0.000012  |
| checking query cache for query | 0.000128  |
| Opening tables                 | 0.000031  |
| System lock                    | 0.00001   |
| Table lock                     | 0.000021  |
| init                           | 0.000241  |
| optimizing                     | 0.000026  |
| statistics                     | 0.000056  |
| preparing                      | 0.000029  |
| executing                      | 0.000008  |
| Sending data                   | 10.224004 |
| end                            | 0.000025  |
| query end                      | 0.00001   |
| freeing items                  | 0.000023  |
| closing tables                 | 0.00002   |
| logging slow query             | 0.000087  |
+--------------------------------+-----------+
16 rows in set (0.00 sec)

mysql> select projects.name as Project, project_types.name as Type from projects,project_types where project_types.id = projects.project_type limit 1000000 into outfile '/tmp/projects.txt';
Query OK, 1000000 rows affected (6 min 7.36 sec)

mysql> show profile for query 3;
+--------------------------------+-----------+
| Status                         | Duration  |
+--------------------------------+-----------+
| (initialization)               | 0.000011  |
| checking query cache for query | 0.000138  |
| Opening tables                 | 0.000032  |
| System lock                    | 0.000132  |
| Table lock                     | 0.000024  |
| init                           | 0.000375  |
| optimizing                     | 0.000027  |
| statistics                     | 0.000079  |
| preparing                      | 0.000171  |
| executing                      | 0.000009  |
| Sending data                   | 379.69978 |
| end                            | 0.000013  |
| query end                      | 0.000006  |
| freeing items                  | 0.000013  |
| closing tables                 | 0.000009  |
| logging slow query             | 0.0000800 |
+--------------------------------+-----------+
16 rows in set (0.01 sec)
</pre>
<p>Now, a version of the projects table using the ENUM data type: </p>
<pre>

mysql> select p.name,pt.name from projects p, project_types pt where pt.id = p.project_type into outfile '/tmp/projects.csv' fields terminated by ',';
Query OK, 1774292 rows affected (12 min 3.68 sec)

mysql> create table projects_unnormalized (name char(30) not null, project_type enum ('Consultancy','Development','Research','Support'));
Query OK, 0 rows affected (0.01 sec)

mysql> load data infile '/tmp/projects.csv' into table projects_unnormalized fields terminated by ',';
Query OK, 1774292 rows affected (2.56 sec)
Records: 1774292  Deleted: 0  Skipped: 0  Warnings: 0

mysql> alter table projects_unnormalized add id int unsigned not null auto_increment primary key;
Query OK, 1774292 rows affected (9.53 sec)
Records: 1774292  Duplicates: 0  Warnings: 0

mysql> alter table projects_unnormalized add index(name), add index(project_type);
Query OK, 1774292 rows affected (30.96 sec)
Records: 1774292  Duplicates: 0  Warnings: 0

mysql> select projects_unnormalized.name as Project, projects_unnormalized.project_type as Type from projects_unnormalized limit 10000 into outfile '/tmp/projects.txt';
Query OK, 10000 rows affected (0.01 sec)

mysql> show profile for query 75;
+--------------------------------+----------+
| Status                         | Duration |
+--------------------------------+----------+
| (initialization)               | 0.000013 |
| checking query cache for query | 0.000122 |
| Opening tables                 | 0.000023 |
| System lock                    | 0.000012 |
| Table lock                     | 0.000021 |
| init                           | 0.00023  |
| optimizing                     | 0.000012 |
| statistics                     | 0.000026 |
| preparing                      | 0.000021 |
| executing                      | 0.000009 |
| Sending data                   | 0.016887 |
| end                            | 0.000024 |
| query end                      | 0.00001  |
| freeing items                  | 0.000018 |
| closing tables                 | 0.000017 |
| logging slow query             | 0.000081 |
+--------------------------------+----------+
16 rows in set (0.00 sec)

mysql> select projects_unnormalized.name as Project, projects_unnormalized.project_type as Type from projects_unnormalized limit 100000 into outfile '/tmp/projects.txt';
Query OK, 100000 rows affected (0.17 sec)

mysql> show profile for query 77;
+--------------------------------+-----------+
| Status                         | Duration  |
+--------------------------------+-----------+
| (initialization)               | 0.000015  |
| checking query cache for query | 0.00013   |
| Opening tables                 | 0.000027  |
| System lock                    | 0.000011  |
| Table lock                     | 0.00002   |
| init                           | 0.000236  |
| optimizing                     | 0.000012  |
| statistics                     | 0.000026  |
| preparing                      | 0.000021  |
| executing                      | 0.000008  |
| Sending data                   | 0.177526  |
| end                            | 0.000022  |
| query end                      | 0.00001   |
| freeing items                  | 0.000022  |
| closing tables                 | 0.000021  |
| logging slow query             | 0.0000900 |
+--------------------------------+-----------+
16 rows in set (0.00 sec)

mysql> select projects_unnormalized.name as Project, projects_unnormalized.project_type as Type from projects_unnormalized limit 1000000 into outfile '/tmp/projects.txt';
Query OK, 1000000 rows affected (0.85 sec)

mysql> show profile for query 79;
+--------------------------------+-----------+
| Status                         | Duration  |
+--------------------------------+-----------+
| (initialization)               | 0.000005  |
| checking query cache for query | 0.000128  |
| Opening tables                 | 0.000031  |
| System lock                    | 0.000011  |
| Table lock                     | 0.00002   |
| init                           | 0.000244  |
| optimizing                     | 0.000012  |
| statistics                     | 0.0000950 |
| preparing                      | 0.000024  |
| executing                      | 0.000007  |
| Sending data                   | 2.605529  |
| end                            | 0.000023  |
| query end                      | 0.000008  |
| freeing items                  | 0.00002   |
| closing tables                 | 0.00002   |
| logging slow query             | 0.000098  |
+--------------------------------+-----------+
16 rows in set (0.00 sec)
</pre>
<p>And now, EXPLAIN for both schemas: </p>
<pre>

mysql> explain select projects.name as Project, project_types.name as Type from projects,project_types where project_types.id = projects.project_type;
+----+-------------+---------------+-------+-----------------------+-----------------------+---------+-----------------------+--------+-------------+
| id | select_type | table         | type  | possible_keys         | key                   | key_len | ref                   | rows   | Extra       |
+----+-------------+---------------+-------+-----------------------+-----------------------+---------+-----------------------+--------+-------------+
|  1 | SIMPLE      | project_types | index | PRIMARY               | name                  | 30      | NULL                  |      4 | Using index |
|  1 | SIMPLE      | projects      | ref   | proyects_project_type | proyects_project_type | 4       | test.project_types.id | 512503 |             |
+----+-------------+---------------+-------+-----------------------+-----------------------+---------+-----------------------+--------+-------------+
2 rows in set (0.00 sec)

mysql> explain select projects_unnormalized.name as Project, projects_unnormalized.project_type as Type from projects_unnormalized;
+----+-------------+-----------------------+------+---------------+------+---------+------+---------+-------+
| id | select_type | table                 | type | possible_keys | key  | key_len | ref  | rows    | Extra |
+----+-------------+-----------------------+------+---------------+------+---------+------+---------+-------+
|  1 | SIMPLE      | projects_unnormalized | ALL  | NULL          | NULL | NULL    | NULL | 1774292 |       |
+----+-------------+-----------------------+------+---------------+------+---------+------+---------+-------+
1 row in set (0.00 sec)
</pre>
<p>Well, it&#8217;s quite obvious that while using ENUM might seem a little less elegant from a design point of view (what happens if I use Types in more than just one table? What if I want to add a type? I have to alter the table!), the performance benefits, if I&#8217;m going to be handling large quantities of data (and my tests have been with small amounts, but I don&#8217;t have a server, just a very humble notebook) might be worthwhile. </p>
<p>Someetimes, it&#8217;s OK to bend the rules a little bit if you need a performance boost. Just like GOTO loops are sometimes OK (mysql has some on it&#8217;s source code, btw). It&#8217;s all a matter of need, expertise, and experience. </p>
<p>The TAO of programming puts it better: </p>
<blockquote><p>
There once was a master programmer who wrote unstructured programs. A novice programmer, seeking to imitate him, also began to write unstructured programs. When the novice asked the master to evaluate his progress, the master criticized him for writing unstructured programs, saying, &#8220;What is appropriate for the master is not appropriate for the novice. You must understand the Tao before transcending structure.&#8221;
</p></blockquote>
<p>It should be clear from the output of EXPLAIN that going through two tables, doing a join with of ref type with that many tuples is probably not good (At least that&#8217;s what it says in the manual: &#8220;If the key that is used matches only a few rows, this is a good join type&#8221;), while the table structure using the ENUM datatype can return all rows at the cost of a single table scan. </p>
<p>The main reason is that, according to my understanding, the ENUM datatype implements the mapping between the byte that&#8217;s stored in the table to reference the element and the mnemonic name you give to it using the internal structure known as typelib. </p>
<p>You can refer to lines 7771 &#8211; 7782 of libmysqld/field.cc in the MySQL CE 5.0 source code for the definition of the val_str in the Field_enum data type to find this, here&#8217;s the snippet: </p>
<pre>
String *Field_enum::val_str(String *val_buffer __attribute__((unused)),
                            String *val_ptr)
{
  uint tmp=(uint) Field_enum::val_int();
  if (!tmp || tmp > typelib->count)
    val_ptr->set("", 0, field_charset);
  else
    val_ptr->set((const char*) typelib->type_names[tmp-1],
                 typelib->type_lengths[tmp-1],
                 field_charset);
  return val_ptr;
}
</pre>
<p>Well, this is a very simple example, but it&#8217;s all I needed today <img src='http://fernandoipar.com/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' /> </p>


<p>Related posts:<ol><li><a href='http://fernandoipar.com/2009/04/06/using-mysql-proxy-to-benchmark-query-performance/' rel='bookmark' title='Permanent Link: Using MySQL Proxy to benchmark query performance'>Using MySQL Proxy to benchmark query performance</a> <small>By transparently sitting between client and server on each request,...</small></li><li><a href='http://fernandoipar.com/2009/08/14/generating-data-with-dbmonster/' rel='bookmark' title='Permanent Link: Generating data with dbmonster'>Generating data with dbmonster</a> <small> In my last post I included some sample data...</small></li><li><a href='http://fernandoipar.com/2010/01/26/how-to-prevent-explain-from-executing-subqueries/' rel='bookmark' title='Permanent Link: how to prevent explain from executing subqueries'>how to prevent explain from executing subqueries</a> <small>Here&#8217;s a quick tip for using explain: You may know...</small></li></ol></p>
<p>Related posts brought to you by <a href='http://mitcho.com/code/yarpp/'>Yet Another Related Posts Plugin</a>.</p>]]></content:encoded>
			<wfw:commentRss>http://fernandoipar.com/2009/03/09/using-the-enum-data-type-to-increase-performance/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
	</channel>
</rss>

