<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Fernando Ipar &#187; optimization</title>
	<atom:link href="http://fernandoipar.com/tag/optimization/feed/" rel="self" type="application/rss+xml" />
	<link>http://fernandoipar.com</link>
	<description>I love mankind! Its people I can&#039;t stand!</description>
	<lastBuildDate>Wed, 07 Jul 2010 04:10:33 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=abc</generator>
		<item>
		<title>Indexing text columns in MySQL</title>
		<link>http://fernandoipar.com/2009/08/12/indexing-text-columns-in-mysql/</link>
		<comments>http://fernandoipar.com/2009/08/12/indexing-text-columns-in-mysql/#comments</comments>
		<pubDate>Wed, 12 Aug 2009 22:07:28 +0000</pubDate>
		<dc:creator>fernando</dc:creator>
				<category><![CDATA[MySQL]]></category>
		<category><![CDATA[indexes]]></category>
		<category><![CDATA[optimization]]></category>
		<category><![CDATA[performance]]></category>

		<guid isPermaLink="false">http://fernandoipar.com/?p=178</guid>
		<description><![CDATA[This time, I&#8217;m talking about indexes for string typed columns. In particular, I&#8217;ll show a procedure I find useful while looking for good index length values for these columns. I&#8217;ll use a sample table called people. Here&#8217;s what it looks like: mysql&#62; desc people; +&#8212;&#8212;&#8212;&#8212;+&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;+&#8212;&#8212;+&#8212;&#8211;+&#8212;&#8212;&#8212;+&#8212;&#8212;&#8212;&#8212;&#8212;-+ &#124; Field &#124; Type &#124; Null &#124; Key &#124; Default [...]


Related posts:<ol><li><a href='http://fernandoipar.com/2009/04/06/using-mysql-proxy-to-benchmark-query-performance/' rel='bookmark' title='Permanent Link: Using MySQL Proxy to benchmark query performance'>Using MySQL Proxy to benchmark query performance</a> <small>By transparently sitting between client and server on each request,...</small></li><li><a href='http://fernandoipar.com/2009/04/18/making-use-of-procedure-analyse/' rel='bookmark' title='Permanent Link: Making use of procedure analyse()'>Making use of procedure analyse()</a> <small>SELECT Field0[,Field1,Field2,...] FROM TABLE PROCEDURE ANALYSE() is a nice tool...</small></li><li><a href='http://fernandoipar.com/2010/01/26/how-to-prevent-explain-from-executing-subqueries/' rel='bookmark' title='Permanent Link: how to prevent explain from executing subqueries'>how to prevent explain from executing subqueries</a> <small>Here&#8217;s a quick tip for using explain: You may know...</small></li></ol>

Related posts brought to you by <a href='http://mitcho.com/code/yarpp/'>Yet Another Related Posts Plugin</a>.]]></description>
			<content:encoded><![CDATA[<p><P>This time, I&#8217;m talking about indexes for string typed columns. In particular, I&#8217;ll show a procedure I find useful while looking for good index length values for these columns.<br />
</P><br />
<P>I&#8217;ll use a sample table called people.<br />
</P><br />
<P>Here&#8217;s what it looks like:<br />
</P><br />
<PRE>mysql&gt; desc people;
+&#8212;&#8212;&#8212;&#8212;+&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;+&#8212;&#8212;+&#8212;&#8211;+&#8212;&#8212;&#8212;+&#8212;&#8212;&#8212;&#8212;&#8212;-+
| Field      | Type             | Null | Key | Default | Extra          |
+&#8212;&#8212;&#8212;&#8212;+&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;+&#8212;&#8212;+&#8212;&#8211;+&#8212;&#8212;&#8212;+&#8212;&#8212;&#8212;&#8212;&#8212;-+
| id         | int(11) unsigned | NO   | PRI | NULL    | auto_increment |
| title      | varchar(250)     | NO   |     | NULL    |                |
| city       | varchar(250)     | NO   |     | NULL    |                |
| occupation | varchar(250)     | NO   |     | NULL    |                |
+&#8212;&#8212;&#8212;&#8212;+&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;+&#8212;&#8212;+&#8212;&#8211;+&#8212;&#8212;&#8212;+&#8212;&#8212;&#8212;&#8212;&#8212;-+
4 rows in set (0.00 sec)

mysql&gt; select count(*) from people;
+&#8212;&#8212;&#8212;-+
| count(*) |
+&#8212;&#8212;&#8212;-+
|   150000 |
+&#8212;&#8212;&#8212;-+
1 row in set (0.00 sec)

mysql&gt; </pre><P><br />
We&#8217;ll start by using procedure analyse to get some useful information<br />
about our data. Unless you know some fields are good candidates for<br />
use with the ENUM datatype, invoke procedure analyse with arguments<br />
(0,0) in order to prevent mysql from suggesting huge ENUMs for string<br />
columns.<br />
</P><br />
<PRE>mysql&gt; select * from people procedure analyse(0,0)\G
*************************** 1. row ***************************
             Field_name: test.people.id
              Min_value: 1
              Max_value: 150000
             Min_length: 1
             Max_length: 6
       Empties_or_zeros: 0
                  Nulls: 0
Avg_value_or_avg_length: 75000.5000
                    Std: 87258.1632
      Optimal_fieldtype: MEDIUMINT(6) UNSIGNED NOT NULL
*************************** 2. row ***************************
             Field_name: test.people.title
              Min_value: aback exclaims stopgap&#8217;s chapel&#8217;s tanked claps snowshoe cigarette correlates extras laster cluc
              Max_value: Zulus colossally dictate cleft&#8217;s enchanter del
             Min_length: 40
             Max_length: 150
       Empties_or_zeros: 0
                  Nulls: 0
Avg_value_or_avg_length: 95.0869
                    Std: NULL
      Optimal_fieldtype: TINYTEXT NOT NULL
*************************** 3. row ***************************
             Field_name: test.people.city
              Min_value: aback ascertaining unw
              Max_value: Zulus imprisonments veiner a
             Min_length: 5
             Max_length: 30
       Empties_or_zeros: 0
                  Nulls: 0
Avg_value_or_avg_length: 17.4861
                    Std: NULL
      Optimal_fieldtype: TINYTEXT NOT NULL
*************************** 4. row ***************************
             Field_name: test.people.occupation
              Min_value:
              Max_value:
             Min_length: 0
             Max_length: 0
       Empties_or_zeros: 150000
                  Nulls: 0
Avg_value_or_avg_length: 0.0000
                    Std: NULL
      Optimal_fieldtype: CHAR(0) NOT NULL
4 rows in set (0.19 sec)

mysql&gt; </pre><P><br />
The id column is numeric, and is actually already indexed. We can&#8217;t<br />
see this right here (though I showed an example of how you can use<br />
the output of procedure analyse and extend it to suit your needs, in<br />
which I did include an &#8216;Indexed&#8217; column in the output), but we can<br />
see this, and gan more information from the table, with the following<br />
statement:<br />
</P><br />
<PRE>mysql&gt; show index from people;
+&#8212;&#8212;&#8211;+&#8212;&#8212;&#8212;&#8212;+&#8212;&#8212;&#8212;-+&#8212;&#8212;&#8212;&#8212;&#8211;+&#8212;&#8212;&#8212;&#8212;-+&#8212;&#8212;&#8212;&#8211;+&#8212;&#8212;&#8212;&#8212;-+&#8212;&#8212;&#8212;-+&#8212;&#8212;&#8211;+&#8212;&#8212;+&#8212;&#8212;&#8212;&#8212;+&#8212;&#8212;&#8212;+
| Table  | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment |
+&#8212;&#8212;&#8211;+&#8212;&#8212;&#8212;&#8212;+&#8212;&#8212;&#8212;-+&#8212;&#8212;&#8212;&#8212;&#8211;+&#8212;&#8212;&#8212;&#8212;-+&#8212;&#8212;&#8212;&#8211;+&#8212;&#8212;&#8212;&#8212;-+&#8212;&#8212;&#8212;-+&#8212;&#8212;&#8211;+&#8212;&#8212;+&#8212;&#8212;&#8212;&#8212;+&#8212;&#8212;&#8212;+
| people |          0 | PRIMARY  |            1 | id          | A         |      150000 |     NULL | NULL   |      | BTREE      |         |
+&#8212;&#8212;&#8211;+&#8212;&#8212;&#8212;&#8212;+&#8212;&#8212;&#8212;-+&#8212;&#8212;&#8212;&#8212;&#8211;+&#8212;&#8212;&#8212;&#8212;-+&#8212;&#8212;&#8212;&#8211;+&#8212;&#8212;&#8212;&#8212;-+&#8212;&#8212;&#8212;-+&#8212;&#8212;&#8211;+&#8212;&#8212;+&#8212;&#8212;&#8212;&#8212;+&#8212;&#8212;&#8212;+
1 row in set (0.00 sec)

mysql&gt;
</pre><P><br />
In order to create a good index for title, we can use the following query. I used 95 as a starting point, since it&#8217;s reported by procedure analyse as the average data length of this column.<br />
</P></p>
<pre>
mysql> select count(distinct(substr(title,1,95))) / count(distinct(title)) * 100 from people;
+--------------------------------------------------------------------+
| count(distinct(substr(title,1,95))) / count(distinct(title)) * 100 |
+--------------------------------------------------------------------+
|                                                           100.0000 |
+--------------------------------------------------------------------+
1 row in set (1.42 sec)

mysql> 
</pre>
<p><P><br />
As you can see, with 95 chars, we can get an index that covers 100% of the rows (i.e., get distinct values for all of them). Still, it&#8217;s a big number. Using this query, we<br />
can begin to play a little bit with the index size, until we get to a good compromise between enough distinct values and an index that&#8217;s small enough to be processed fast and<br />
maybe even loaded into memory.<br />
</P></p>
<pre>
mysql> select count(distinct(substr(title,1,20))) / count(distinct(title)) * 100 from people;
+--------------------------------------------------------------------+
| count(distinct(substr(title,1,20))) / count(distinct(title)) * 100 |
+--------------------------------------------------------------------+
|                                                            99.9507 |
+--------------------------------------------------------------------+
1 row in set (1.15 sec)

mysql>
</pre>
<p><P><br />
As it turns out, my data set doesn&#8217;t require too many characters in order to be differentiated. Actually, I&#8217;m kind of cheating here, for a table this size, since I used a data generator to populate this tables, and it<br />
generated a lot of random text. Real world data would probably require a larger prefix in order to get such good differentiation. Anyway, let&#8217;s push it a little bit more.<br />
</P></p>
<pre>
mysql> select count(distinct(substr(title,1,15))) / count(distinct(title)) * 100 from people;
+--------------------------------------------------------------------+
| count(distinct(substr(title,1,15))) / count(distinct(title)) * 100 |
+--------------------------------------------------------------------+
|                                                            97.0787 |
+--------------------------------------------------------------------+
1 row in set (1.18 sec)

mysql> select count(distinct(substr(title,1,14))) / count(distinct(title)) * 100 from people;
+--------------------------------------------------------------------+
| count(distinct(substr(title,1,14))) / count(distinct(title)) * 100 |
+--------------------------------------------------------------------+
|                                                            94.4247 |
+--------------------------------------------------------------------+
1 row in set (1.13 sec)
</pre>
<p><P><br />
Here&#8217;s the turning point for me. A jumb between 94% and 97% of index coverage in just 1 character. So I&#8217;m sticking with 15. Let&#8217;s test this with some queries.<br />
</P></p>
<pre>
mysql> select title from people limit 40;
+----------------------------------------------------------------------------------------------------------------------------------------------------+
| title                                                                                                                                              |
+----------------------------------------------------------------------------------------------------------------------------------------------------+
| puffs war's bruises buckles attainably Warnock's discoverer degeneration plots admirably assimilates germane burlesquely ri                        |
| arbitrariness MacDraw's carbonates suckers budget chronicler cur drabs untested Aryans imperial                                                    |
| commender dozes distills blackbird's mend meta                                                                                                     |
| gallons haying occupation's sculpt fittingness scores onwards recessed masculineness denominator's regulated boyfriend's                           |
| authored metaphor derivatively matchmakers ratification railing advantageousness flossing twin's barbarously infinite retreat alloying tenting t   |
| Africans determinateness enquired quivers replaces nowhere applicability negative alarms lacquerer shivered arachnid ulcer sil                     |
| filter offerings unboundedness clearness enthusiast commandants blunted betide rusticated blacks helmet's squabbles tasked Beethoven contro        |
| thirties oftener tunnel anguish attainable formulat                                                                                                |
| grotesquely fallacious inessential fain sanctioned too amplifi                                                                                     |
| consort rapes deeply marker patterns compacted plumbe                                                                                              |
| nasally combings searcher's pathname's bolts retrospective aroused squintingly boyish singers recompiles Austral                                   |
| purpled draggingly nobody's luckier spinning goddess oscilloscopes aimer                                                                           |
| aphasia reconverts shams entangle placer metaphysical visited turret nai                                                                           |
| violation bituminous unweighed darkness cackles consonant foully fisted loci relishes burn m                                                       |
| unsuffixed overdose humbles corpses fashions slashingly quietude delighte                                                                          |
| sheller hypocrisy falser productions shied cube breed childishness requested pads redoub                                                           |
| broils aorta refund sinker cankering reawakens portrayed resolving bard's stand ejects inhabitant's tittering genders proposition                  |
| cyclone's glorification unrestricted delicately inhibitive waterway wardrobes excommunicated laugher                                               |
| poppies heroine's gunner swollen reticle vertebrate's shrank unreliabilities infractions pretentious angstroms relations highness feasibil         |
| ampoule clustering intermediaries honer ree                                                                                                        |
| creature's transferals tidal unsigned stitching ought coerces visa girdling porn janitors parer song's croaked ta                                  |
| hammer amalgamating stunting feasibility hopefulness oilier spraying frets pinks                                                                   |
| comelier tomorrow's cowboy chalked lewdness cordial supering rut's neurally blindingly mute drowsiest gives in                                     |
| slides aqueduct glazers abolition dangers sultry raid prominence hedges walks toppled defenders autocrat                                           |
| theoretic thumps scum's photos bootlegged enveloper sallying populations disruptions inaugurate conclu                                             |
| annotated bibliographies lichen user's bluebird's subproofs unendurably recollection's crumple                                                     |
| sergeant outlets pinion reducer wiling impinge apes insaneness dose automatics lighthouse's cursory sleepily web's interruptions superin           |
| tautened skylarks toad's seminar's archangel's sarcasm shipwrecks indeed incliner tying waterf                                                     |
| chaotic censuses intimacy custodian's extendedly womb's safeguarding desire                                                                        |
| abusiveness skippered inspirer enunciation taper memory's clearly guardianship inputed m                                                           |
| firing anaphora subsegments turbulence affectedness refractory unsprayed chapter's volumes undramati                                               |
| chef reception's glens budged budge arson assistance disagreeableness fodder garnering boated skater heroine's pamphle                             |
| prophetic spilling asper petter's constable's classic ices teethes mails office's sordidness cylindered chaffing bivouac skeptics shuttering quash |
| hoppers iciest sharer dietitian dictionaries frac                                                                                                  |
| racketeer Ellen amounts origin's abstractions render vanish pantries retrieve Maxtor unprojected antithes                                          |
| hinter arrangers dialogue imputing droppers shelver boyish demonstrator braving submitting operated carbonate protruding creasing prospecto        |
| electrify garment linked discernible transceiver's ungrounded telegrapher uncoated                                                                 |
| badly unaffected vex taming affiliation strings detracts grandpas girded cafeteria approving ideology froze underlinings assass                    |
| MHz articulate draws transpires rubbling swarthier reeler bagged snug assisted consonant's settl                                                   |
| taxi's wearying espies Anglican's intangibly fluent jugs liveth pride ex                                                                           |
+----------------------------------------------------------------------------------------------------------------------------------------------------+
40 rows in set (0.00 sec)

mysql>
</pre>
<p><P><br />
That should give you an idea of what type of needles we&#8217;re looking for <img src='http://fernandoipar.com/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' /><br />
</P></p>
<pre>
mysql> select * from people where title like 'taxi%';
+--------+-----------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------+------------+
| id     | title                                                                                                                                               | city                           | occupation |
+--------+-----------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------+------------+
|     40 | taxi's wearying espies Anglican's intangibly fluent jugs liveth pride ex                                                                            | rotation tri                   |            |
|   5736 | taxi's allegorically accounting manipulatory cautiousness computational promoter wool reproo                                                        | islan                          |            |
|   6967 | taxis sprawls unblushing rude put absorbs reproducibilities crumblier kid                                                                           | DeMorgan overhe                |            |
|   8388 | taxi catsup ornament transformer widener syndicates dismount pop t                                                                                  | exhibition's manages hedgehog' |            |
|  10418 | taxingly eligibility whichever meditation corrosion unluckiness intoxicat                                                                           | gagged politeness looser       |            |
|  13091 | taxi endowment watchfulness battalions stay trickle tangle blowfish maid's transmissions questionnaire vomit saner strokers constituent crab's      | populations window's zoo armie |            |
|  16723 | taxi storing couldest bouts allegoric cluttered steeples fives hitchhike thrashes retirement de                                                     | envisaged maintain             |            |
|  22168 | taxicab voicer controllers removing cellular houses router nourishing edict shrines strikes testicle's destine whale russeted certi                 | masturbates pu                 |            |
|  24244 | taxing sleeve consultant's nonprogrammable twine delayer ingot respecter subex                                                                      | prematurely significant        |            |
|  35595 | taxicabs aback spinal checkers germs overdraft's coon critter's patrician fled coalition massaging paced condemning impen                           | oppre                          |            |
|  37105 | taxis articulatory indulgence bystanders skin burgess starlight calendaring aunt's bilging benightedness smallest softened xiv immerser fresher unn | crudely papally r              |            |
|  40871 | taxicabs muzzling precocious resentment fellers pitiers beasts marines baselines diagrammatically clowning connecters stampedin                     | influencer                     |            |
|  44298 | taxi rages unintelligibleness anastomosis orthogonally incompatibilities keypads hoarse province stamping perceived sh                              | unforgiving quiet              |            |
|  46395 | taxi plunders novelty's downstairs newborn symbiotic climax highlights lounger keypads only schools possibilities                                   | flowing forgeries slende       |            |
|  66078 | taxicab mercilessly excesses ships merchandising patch strobe                                                                                       | armfuls firmament hum coop     |            |
|  71095 | taxing dispense regrettably resuspended kilobits downwardly domestically laps rainiest recapitulates despiser trophies chums a                      | enumerate indoctrinat          |            |
|  77668 | taxicabs approachable disqualifying charcoaled script's o                                                                                           | kited publish disburse anarch  |            |
|  84162 | taxicab's captivity dean eyeball uninspiring pawn's complication outcast's stared sneak s                                                           | impracticable dungeon crop     |            |
|  87930 | taxi swiftly repacks unsupported slice mornings squares gland solar brainier harrying wag                                                           | cowslip halter plastics        |            |
|  91282 | taxiing undetected cast commands clasping germina                                                                                                   | waxes her                      |            |
| 104029 | taxied roofs besetting leadership electrocuted input metaphor bubbler vowing sponges assess                                                         | worthing understated bark      |            |
| 105818 | taxis libretti defensively shoes antagonistically heavier endeared accidental gauging intercourse revolte                                           | runne                          |            |
| 106163 | taxis bonfire's bench stereo preventer boringness blot's quieter acronyms transplant gained implores ba                                             | sighting leased sp             |            |
| 122471 | taxicab's they've berries invader touching bumblingly courtier's boosting undisguised destroy amanuensis bangles digestiveness poppy's hulls        | purity professional unski      |            |
| 123931 | taxicab's headgear Popek ratifying tenured Pascal's subduedly quitting earned planter forgave implicated noo                                        | bibliographies fraill          |            |
| 127383 | taxi choir parameter's busted inspiration's fixated blinking complicator outwit plotters gobbles burningly leafed corruptively                      | radioed size telegr            |            |
| 134211 | taxi's reconstructible indirect agglutination awaken eked unoccupied pillager subcomputation interviewing treader commending i                      | muddiness broom's              |            |
| 145784 | taxicabs sanctuary armful battening terrifying impactors guns exchequer reigns laughter desolater s                                                 | buttonhole's isomorphism       |            |
| 146371 | taxied carnivals giver misconceptions countenancer introduced anchovy exile pipelines weaned unabridged Britishly abyss's extenuating moodiness th  | penetrator upsho               |            |
+--------+-----------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------+------------+
29 rows in set (0.11 sec)

mysql> explain select * from people where title like 'taxi%';
+----+-------------+--------+------+---------------+------+---------+------+--------+-------------+
| id | select_type | table  | type | possible_keys | key  | key_len | ref  | rows   | Extra       |
+----+-------------+--------+------+---------------+------+---------+------+--------+-------------+
|  1 | SIMPLE      | people | ALL  | NULL          | NULL | NULL    | NULL | 150000 | Using where |
+----+-------------+--------+------+---------------+------+---------+------+--------+-------------+

mysql> select * from people where title = 'hoppers iciest sharer dietitian dictionaries frac';
+----+---------------------------------------------------+--------------------+------------+
| id | title                                             | city               | occupation |
+----+---------------------------------------------------+--------------------+------------+
| 34 | hoppers iciest sharer dietitian dictionaries frac | coroneted revolve  |            |
+----+---------------------------------------------------+--------------------+------------+
1 row in set (0.12 sec)

mysql> explain select * from people where title = 'hoppers iciest sharer dietitian dictionaries frac';
+----+-------------+--------+------+---------------+------+---------+------+--------+-------------+
| id | select_type | table  | type | possible_keys | key  | key_len | ref  | rows   | Extra       |
+----+-------------+--------+------+---------------+------+---------+------+--------+-------------+
|  1 | SIMPLE      | people | ALL  | NULL          | NULL | NULL    | NULL | 150000 | Using where |
+----+-------------+--------+------+---------------+------+---------+------+--------+-------------+
1 row in set (0.00 sec)
</pre>
<p>
Ok, now let&#8217;s create the index. I&#8217;m loading it into a cache here, which is not necessary (even less given my size of 150000 tuples), but it helps. In order to do this, all the indexes in your table must have the same block size.
</p>
<pre>
mysql> create index idx_people_title on people(title(15));
Query OK, 150000 rows affected (1.60 sec)
Records: 150000  Duplicates: 0  Warnings: 0

mysql> reset query cache;
Query OK, 0 rows affected (0.00 sec)

mysql> load index into cache people;
+-------------+--------------+----------+----------+
| Table       | Op           | Msg_type | Msg_text |
+-------------+--------------+----------+----------+
| test.people | preload_keys | status   | OK       |
+-------------+--------------+----------+----------+
1 row in set (0.00 sec)
</pre>
<p>
Let&#8217;s re test the queries:
</p>
<pre>

mysql> select * from people where title like 'taxi%';
+--------+-----------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------+------------+
| id     | title                                                                                                                                               | city                           | occupation |
+--------+-----------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------+------------+
|   8388 | taxi catsup ornament transformer widener syndicates dismount pop t                                                                                  | exhibition's manages hedgehog' |            |
| 127383 | taxi choir parameter's busted inspiration's fixated blinking complicator outwit plotters gobbles burningly leafed corruptively                      | radioed size telegr            |            |
|  13091 | taxi endowment watchfulness battalions stay trickle tangle blowfish maid's transmissions questionnaire vomit saner strokers constituent crab's      | populations window's zoo armie |            |
|  46395 | taxi plunders novelty's downstairs newborn symbiotic climax highlights lounger keypads only schools possibilities                                   | flowing forgeries slende       |            |
|  44298 | taxi rages unintelligibleness anastomosis orthogonally incompatibilities keypads hoarse province stamping perceived sh                              | unforgiving quiet              |            |
|  16723 | taxi storing couldest bouts allegoric cluttered steeples fives hitchhike thrashes retirement de                                                     | envisaged maintain             |            |
|  87930 | taxi swiftly repacks unsupported slice mornings squares gland solar brainier harrying wag                                                           | cowslip halter plastics        |            |
|   5736 | taxi's allegorically accounting manipulatory cautiousness computational promoter wool reproo                                                        | islan                          |            |
| 134211 | taxi's reconstructible indirect agglutination awaken eked unoccupied pillager subcomputation interviewing treader commending i                      | muddiness broom's              |            |
|     40 | taxi's wearying espies Anglican's intangibly fluent jugs liveth pride ex                                                                            | rotation tri                   |            |
|  66078 | taxicab mercilessly excesses ships merchandising patch strobe                                                                                       | armfuls firmament hum coop     |            |
|  22168 | taxicab voicer controllers removing cellular houses router nourishing edict shrines strikes testicle's destine whale russeted certi                 | masturbates pu                 |            |
|  84162 | taxicab's captivity dean eyeball uninspiring pawn's complication outcast's stared sneak s                                                           | impracticable dungeon crop     |            |
| 123931 | taxicab's headgear Popek ratifying tenured Pascal's subduedly quitting earned planter forgave implicated noo                                        | bibliographies fraill          |            |
| 122471 | taxicab's they've berries invader touching bumblingly courtier's boosting undisguised destroy amanuensis bangles digestiveness poppy's hulls        | purity professional unski      |            |
|  35595 | taxicabs aback spinal checkers germs overdraft's coon critter's patrician fled coalition massaging paced condemning impen                           | oppre                          |            |
|  77668 | taxicabs approachable disqualifying charcoaled script's o                                                                                           | kited publish disburse anarch  |            |
|  40871 | taxicabs muzzling precocious resentment fellers pitiers beasts marines baselines diagrammatically clowning connecters stampedin                     | influencer                     |            |
| 145784 | taxicabs sanctuary armful battening terrifying impactors guns exchequer reigns laughter desolater s                                                 | buttonhole's isomorphism       |            |
| 146371 | taxied carnivals giver misconceptions countenancer introduced anchovy exile pipelines weaned unabridged Britishly abyss's extenuating moodiness th  | penetrator upsho               |            |
| 104029 | taxied roofs besetting leadership electrocuted input metaphor bubbler vowing sponges assess                                                         | worthing understated bark      |            |
|  91282 | taxiing undetected cast commands clasping germina                                                                                                   | waxes her                      |            |
|  71095 | taxing dispense regrettably resuspended kilobits downwardly domestically laps rainiest recapitulates despiser trophies chums a                      | enumerate indoctrinat          |            |
|  24244 | taxing sleeve consultant's nonprogrammable twine delayer ingot respecter subex                                                                      | prematurely significant        |            |
|  10418 | taxingly eligibility whichever meditation corrosion unluckiness intoxicat                                                                           | gagged politeness looser       |            |
|  37105 | taxis articulatory indulgence bystanders skin burgess starlight calendaring aunt's bilging benightedness smallest softened xiv immerser fresher unn | crudely papally r              |            |
| 106163 | taxis bonfire's bench stereo preventer boringness blot's quieter acronyms transplant gained implores ba                                             | sighting leased sp             |            |
| 105818 | taxis libretti defensively shoes antagonistically heavier endeared accidental gauging intercourse revolte                                           | runne                          |            |
|   6967 | taxis sprawls unblushing rude put absorbs reproducibilities crumblier kid                                                                           | DeMorgan overhe                |            |
+--------+-----------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------+------------+
29 rows in set (0.00 sec)

mysql> explain select * from people where title like 'taxi%';
+----+-------------+--------+-------+------------------+------------------+---------+------+------+-------------+
| id | select_type | table  | type  | possible_keys    | key              | key_len | ref  | rows | Extra       |
+----+-------------+--------+-------+------------------+------------------+---------+------+------+-------------+
|  1 | SIMPLE      | people | range | idx_people_title | idx_people_title | 17      | NULL |   56 | Using where |
+----+-------------+--------+-------+------------------+------------------+---------+------+------+-------------+
1 row in set (0.00 sec)
</pre>
<p>
Notice how we only go through 56 rows now.
</p>
<pre>

mysql> select * from people where title = 'hoppers iciest sharer dietitian dictionaries frac';
+----+---------------------------------------------------+--------------------+------------+
| id | title                                             | city               | occupation |
+----+---------------------------------------------------+--------------------+------------+
| 34 | hoppers iciest sharer dietitian dictionaries frac | coroneted revolve  |            |
+----+---------------------------------------------------+--------------------+------------+
1 row in set (0.00 sec)

mysql> explain select * from people where title = 'hoppers iciest sharer dietitian dictionaries frac';
+----+-------------+--------+------+------------------+------------------+---------+-------+------+-------------+
| id | select_type | table  | type | possible_keys    | key              | key_len | ref   | rows | Extra       |
+----+-------------+--------+------+------------------+------------------+---------+-------+------+-------------+
|  1 | SIMPLE      | people | ref  | idx_people_title | idx_people_title | 17      | const |    1 | Using where |
+----+-------------+--------+------+------------------+------------------+---------+-------+------+-------------+
1 row in set (0.01 sec)
</pre>
<p>
Just 1 row.<br />
Ok, let&#8217;s test the quality of the index to find unique rows.
</p>
<pre>

mysql> explain select * from people where title = 'arbitrariness MacDraw\'s carbonates suckers budget chronicler cur drabs untested Aryans imperial';
+----+-------------+--------+------+------------------+------------------+---------+-------+------+-------------+
| id | select_type | table  | type | possible_keys    | key              | key_len | ref   | rows | Extra       |
+----+-------------+--------+------+------------------+------------------+---------+-------+------+-------------+
|  1 | SIMPLE      | people | ref  | idx_people_title | idx_people_title | 17      | const |    1 | Using where |
+----+-------------+--------+------+------------------+------------------+---------+-------+------+-------------+
1 row in set (0.00 sec)

mysql> explain select * from people where title = 'dunes delightfulness manurers jousts axer aristocrat\'s driver greediness bloke pays preconditions enclosure consideration plaster';
+----+-------------+--------+------+------------------+------------------+---------+-------+------+-------------+
| id | select_type | table  | type | possible_keys    | key              | key_len | ref   | rows | Extra       |
+----+-------------+--------+------+------------------+------------------+---------+-------+------+-------------+
|  1 | SIMPLE      | people | ref  | idx_people_title | idx_people_title | 17      | const |    1 | Using where |
+----+-------------+--------+------+------------------+------------------+---------+-------+------+-------------+
1 row in set (0.00 sec)

mysql> explain select * from people where title = 'satire\'s most quacked campaigning wrists disengaging insignia woodlander knuckles despaired portending incredulous predication Sally\'s amica';
+----+-------------+--------+------+------------------+------------------+---------+-------+------+-------------+
| id | select_type | table  | type | possible_keys    | key              | key_len | ref   | rows | Extra       |
+----+-------------+--------+------+------------------+------------------+---------+-------+------+-------------+
|  1 | SIMPLE      | people | ref  | idx_people_title | idx_people_title | 17      | const |    2 | Using where |
+----+-------------+--------+------+------------------+------------------+---------+-------+------+-------------+
1 row in set (0.00 sec)

mysql> select *,length(title) from people where title = 'satire\'s most quacked campaigning wrists disengaging insignia woodlander knuckles despaired portending incredulous predication Sally\'s amica';
+-----+----------------------------------------------------------------------------------------------------------------------------------------------+------------------+------------+---------------+
| id  | title                                                                                                                                        | city             | occupation | length(title) |
+-----+----------------------------------------------------------------------------------------------------------------------------------------------+------------------+------------+---------------+
| 344 | satire's most quacked campaigning wrists disengaging insignia woodlander knuckles despaired portending incredulous predication Sally's amica | buffaloes refill |            |           140 |
+-----+----------------------------------------------------------------------------------------------------------------------------------------------+------------------+------------+---------------+
1 row in set (0.00 sec)
</pre>
<p>
Granted, 150.000 rows isn&#8217;t much, but still, with an average row data length of 94, I had to find a 140 character title in order to go through 2 rows before the right one was found. That&#8217;s reasonable, considering<br />
I estimated a 97% index coverage.
</p>
<p>
In conclusion, wihle my dataset size certainly isn&#8217;t large enough to do many interesting things, it should prove the point that a good index size will go great lengths into helping you improve the performance of your MySQL based<br />
system. The query I presented here can be useful to look for a decent index size in terms of unique rows coverage.</p>


<p>Related posts:<ol><li><a href='http://fernandoipar.com/2009/04/06/using-mysql-proxy-to-benchmark-query-performance/' rel='bookmark' title='Permanent Link: Using MySQL Proxy to benchmark query performance'>Using MySQL Proxy to benchmark query performance</a> <small>By transparently sitting between client and server on each request,...</small></li><li><a href='http://fernandoipar.com/2009/04/18/making-use-of-procedure-analyse/' rel='bookmark' title='Permanent Link: Making use of procedure analyse()'>Making use of procedure analyse()</a> <small>SELECT Field0[,Field1,Field2,...] FROM TABLE PROCEDURE ANALYSE() is a nice tool...</small></li><li><a href='http://fernandoipar.com/2010/01/26/how-to-prevent-explain-from-executing-subqueries/' rel='bookmark' title='Permanent Link: how to prevent explain from executing subqueries'>how to prevent explain from executing subqueries</a> <small>Here&#8217;s a quick tip for using explain: You may know...</small></li></ol></p>
<p>Related posts brought to you by <a href='http://mitcho.com/code/yarpp/'>Yet Another Related Posts Plugin</a>.</p>]]></content:encoded>
			<wfw:commentRss>http://fernandoipar.com/2009/08/12/indexing-text-columns-in-mysql/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
		<item>
		<title>Making use of procedure analyse()</title>
		<link>http://fernandoipar.com/2009/04/18/making-use-of-procedure-analyse/</link>
		<comments>http://fernandoipar.com/2009/04/18/making-use-of-procedure-analyse/#comments</comments>
		<pubDate>Sat, 18 Apr 2009 19:47:48 +0000</pubDate>
		<dc:creator>fernando</dc:creator>
				<category><![CDATA[MySQL]]></category>
		<category><![CDATA[datatypes]]></category>
		<category><![CDATA[optimization]]></category>

		<guid isPermaLink="false">http://fernandoipar.com/?p=155</guid>
		<description><![CDATA[SELECT Field0[,Field1,Field2,...] FROM TABLE PROCEDURE ANALYSE() is a nice tool to find out more about your table&#8217;s columns. Still, it could be improved in a lot of ways, and the stored procedure below is a starting point. It makes use of procedure analyse (though with &#8216;SELECT * FROM&#8217;), and modifies it&#8217;s output to include the [...]


Related posts:<ol><li><a href='http://fernandoipar.com/2009/04/21/extending-procedure_analyse/' rel='bookmark' title='Permanent Link: Extending procedure_analyse'>Extending procedure_analyse</a> <small>My previous post explored a stored procedure that extended procedure_analyse...</small></li><li><a href='http://fernandoipar.com/2009/08/12/indexing-text-columns-in-mysql/' rel='bookmark' title='Permanent Link: Indexing text columns in MySQL'>Indexing text columns in MySQL</a> <small>This time, I&#8217;m talking about indexes for string typed columns....</small></li><li><a href='http://fernandoipar.com/2009/04/29/soundex-triggers-and-stored-procedures/' rel='bookmark' title='Permanent Link: SOUNDEX(), triggers, and stored procedures'>SOUNDEX(), triggers, and stored procedures</a> <small>MySQL provides a SOUNDEX() function, which returns the soundex of...</small></li></ol>

Related posts brought to you by <a href='http://mitcho.com/code/yarpp/'>Yet Another Related Posts Plugin</a>.]]></description>
			<content:encoded><![CDATA[<p>SELECT Field0[,Field1,Field2,...] FROM TABLE PROCEDURE ANALYSE() is a nice tool to find out more about your table&#8217;s columns.</p>
<p>Still, it could be improved in a lot of ways, and the stored procedure below is a starting point. It makes use of <em>procedure analyse</em> (though with &#8216;SELECT * FROM&#8217;), and modifies it&#8217;s output to include the actual column datatype and the total number of rows of the table.</p>
<p>The actual datatype is a piece of information I&#8217;ve seen a lot of people request, and the number of rows is, I think, a critical piece of information to determine if the output of <em>procedure analyse</em> is credible or not. It&#8217;s not the same thing to take suggestions from mysql on a table with 7 or 20 rows than from a table with 1000000 rows. Of course, remember than numbers alone mean nothing, you might just have 7 rows in a table that represent the entire universe of possible values for your problem domain. In any case, a smart human being with a good toolset is the best way to solve problems!</p>
<p>So here&#8217;s the procedure, which can also be downloaded from this <a title="extended_procedure_analyse.sql" href="http://fernandoipar.com/extended_procedure_analyse.sql">link</a>:</p>
<pre>/*

extended procedure analyse
(C) 2009 Fernando Ipar
mail(at)fernandoipar.com

GPLv2

*/
drop procedure if exists extended_procedure_analyse;
delimiter //
create procedure extended_procedure_analyse(databaseName varchar(64), tableName varchar(64))
begin

	create temporary table procedure_analyse_output
	(
	Field_name varchar(64),
	Min_value int,
	Max_value int,
	Min_length int,
	Max_length int,
	Empties_or_zeros int,
	Nulls int,
	Avg_value_or_avg_length float,
	Std float,
	Optimal_fieldtype text,
	Actual_fieldtype text
	);	

	set @table = concat(databaseName,'.',tableName);
	set @dbName = databaseName;
	set @tbName = tableName;

	set @qry = concat('insert into procedure_analyse_output (Field_name,Min_value,Max_value,Min_length,Max_length,Empties_or_zeros,Nulls,Avg_Value_or_avg_length,Std,Optimal_fieldtype) select * from ', @table,' procedure analyse()');
	prepare myStmt from @qry;
	execute myStmt;

	update procedure_analyse_output set Field_name = replace(Field_name,  CONCAT(databaseName,'.',tableName,'.'),'');

	prepare myStmt from 'update procedure_analyse_output pao, information_schema.columns c set pao.Actual_Fieldtype = c.column_type where table_schema = ? and table_name = ? and column_name = pao.Field_name';
	execute myStmt using @dbName,@tbName;

	set @qry = concat('select count(*) as `Total_number_of_rows` from ',@table);
	prepare myStmt from @qry;

	execute myStmt;

	select * from procedure_analyse_output;

	drop temporary table procedure_analyse_output;

end;
//
delimiter ;</pre>
<p>Here are a couple of sample outputs:</p>
<pre>mysql&gt; call extended_procedure_analyse('test','City')\G
*************************** 1. row ***************************
Total_number_of_rows: 30000
1 row in set (0.11 sec)

*************************** 1. row ***************************
             Field_name: ID
              Min_value: 925001
              Max_value: 955000
             Min_length: 6
             Max_length: 6
       Empties_or_zeros: 0
                  Nulls: 0
Avg_value_or_avg_length: 940000
                    Std: 938839
      Optimal_fieldtype: MEDIUMINT(6) UNSIGNED NOT NULL
       Actual_fieldtype: int(11) unsigned
*************************** 2. row ***************************
             Field_name: CountryCode
              Min_value: 0
              Max_value: 29
             Min_length: 1
             Max_length: 2
       Empties_or_zeros: 1000
                  Nulls: 0
Avg_value_or_avg_length: 14.5
                    Std: 8.6554
      Optimal_fieldtype: ENUM('0','1','2','3','4','5','6','7','8','9','10','11','12','13','14','15','16','17','18','19','20','21','22','23','24','25','26','27','28','29') NOT NULL
       Actual_fieldtype: int(11) unsigned
*************************** 3. row ***************************
             Field_name: Name
              Min_value: 1
              Max_value: 9999
             Min_length: 1
             Max_length: 5
       Empties_or_zeros: 0
                  Nulls: 0
Avg_value_or_avg_length: 4.6605
                    Std: NULL
      Optimal_fieldtype: CHAR(5) NOT NULL
       Actual_fieldtype: varchar(40)
*************************** 4. row ***************************
             Field_name: District
              Min_value: 1
              Max_value: 9999
             Min_length: 1
             Max_length: 5
       Empties_or_zeros: 0
                  Nulls: 0
Avg_value_or_avg_length: 4.6603
                    Std: NULL
      Optimal_fieldtype: CHAR(5) NOT NULL
       Actual_fieldtype: varchar(40)
*************************** 5. row ***************************
             Field_name: Population
              Min_value: 0
              Max_value: 9999
             Min_length: 1
             Max_length: 5
       Empties_or_zeros: 0
                  Nulls: 0
Avg_value_or_avg_length: 4.6647
                    Std: NULL
      Optimal_fieldtype: CHAR(5) NOT NULL
       Actual_fieldtype: varchar(40)
5 rows in set (0.12 sec)

Query OK, 0 rows affected (0.12 sec)

mysql&gt; call extended_procedure_analyse('test','projects_innodb')\G
*************************** 1. row ***************************
Total_number_of_rows: 1007366
1 row in set (14.80 sec)

*************************** 1. row ***************************
             Field_name: id
              Min_value: 1
              Max_value: 1007366
             Min_length: 1
             Max_length: 7
       Empties_or_zeros: 0
                  Nulls: 0
Avg_value_or_avg_length: 503684
                    Std: 581599
      Optimal_fieldtype: MEDIUMINT(7) UNSIGNED NOT NULL
       Actual_fieldtype: int(10) unsigned
*************************** 2. row ***************************
             Field_name: name
              Min_value: 0
              Max_value: 9999
             Min_length: 1
             Max_length: 10
       Empties_or_zeros: 0
                  Nulls: 0
Avg_value_or_avg_length: 4.6958
                    Std: NULL
      Optimal_fieldtype: VARCHAR(10) NOT NULL
       Actual_fieldtype: char(10)
2 rows in set (14.80 sec)

Query OK, 0 rows affected (14.80 sec)

mysql&gt; call extended_procedure_analyse('test','projects_isam')\G
*************************** 1. row ***************************
Total_number_of_rows: 1000000
1 row in set (0.56 sec)

*************************** 1. row ***************************
             Field_name: id
              Min_value: 1
              Max_value: 1000000
             Min_length: 1
             Max_length: 7
       Empties_or_zeros: 0
                  Nulls: 0
Avg_value_or_avg_length: 500000
                    Std: 577358
      Optimal_fieldtype: MEDIUMINT(7) UNSIGNED NOT NULL
       Actual_fieldtype: int(10) unsigned
*************************** 2. row ***************************
             Field_name: name
              Min_value: 0
              Max_value: 9999
             Min_length: 1
             Max_length: 5
       Empties_or_zeros: 0
                  Nulls: 0
Avg_value_or_avg_length: 4.6605
                    Std: NULL
      Optimal_fieldtype: CHAR(5) NOT NULL
       Actual_fieldtype: char(10)
2 rows in set (0.56 sec)

Query OK, 0 rows affected (0.56 sec)</pre>
<p>Notice the difference in response time between the Innodb and MyISAM (yes, I wrongly used the &#8216;isam&#8217; name &#8230;) tables, that&#8217;s because Innodb has to calculate the number of rows for a count(*) query, while MyISAM stores a row count in the table.</p>
<p>I plan to extend this procedure to include index information, with useful data such as overindexed columns (columns that are included as a leftmost prefix in more than one index) and unindexed columns that are queried. The first question is easily answered from information_schema.statistics, using the seq_in_index column, I&#8217;m working on the second one.</p>
<p>Still, if you have the time and interest, play with this early version and let me know what&#8217;s wrong and/or could be improved with it.</p>


<p>Related posts:<ol><li><a href='http://fernandoipar.com/2009/04/21/extending-procedure_analyse/' rel='bookmark' title='Permanent Link: Extending procedure_analyse'>Extending procedure_analyse</a> <small>My previous post explored a stored procedure that extended procedure_analyse...</small></li><li><a href='http://fernandoipar.com/2009/08/12/indexing-text-columns-in-mysql/' rel='bookmark' title='Permanent Link: Indexing text columns in MySQL'>Indexing text columns in MySQL</a> <small>This time, I&#8217;m talking about indexes for string typed columns....</small></li><li><a href='http://fernandoipar.com/2009/04/29/soundex-triggers-and-stored-procedures/' rel='bookmark' title='Permanent Link: SOUNDEX(), triggers, and stored procedures'>SOUNDEX(), triggers, and stored procedures</a> <small>MySQL provides a SOUNDEX() function, which returns the soundex of...</small></li></ol></p>
<p>Related posts brought to you by <a href='http://mitcho.com/code/yarpp/'>Yet Another Related Posts Plugin</a>.</p>]]></content:encoded>
			<wfw:commentRss>http://fernandoipar.com/2009/04/18/making-use-of-procedure-analyse/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
	</channel>
</rss>
