<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Fernando Ipar</title>
	<atom:link href="http://fernandoipar.com/feed/" rel="self" type="application/rss+xml" />
	<link>http://fernandoipar.com</link>
	<description>I love mankind! Its people I can&#039;t stand!</description>
	<lastBuildDate>Thu, 21 Mar 2013 18:18:13 +0000</lastBuildDate>
	<language>en-US</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=</generator>
		<item>
		<title>Presenting at Percona Live NY 2012</title>
		<link>http://fernandoipar.com/2012/09/10/speaking-at-percona-live-ny-2012/</link>
		<comments>http://fernandoipar.com/2012/09/10/speaking-at-percona-live-ny-2012/#comments</comments>
		<pubDate>Mon, 10 Sep 2012 22:55:25 +0000</pubDate>
		<dc:creator>fernando</dc:creator>
				<category><![CDATA[MySQL]]></category>
		<category><![CDATA[Conferences]]></category>

		<guid isPermaLink="false">http://fernandoipar.com/?p=226</guid>
		<description><![CDATA[I&#8217;ll be attending Percona Live NY 2012 next month, co-presenting a tutorial with Marcos Albe (also from Percona) and a couple of sessions with Marcos and Ryan Lowe (of Square). On Monday, Marcos and I will be doing a hands on tutorial on Percona Toolkit, and on Tuesday, Ryan will join us for a session [...]<div class='yarpp-related-rss yarpp-related-none'>

No related posts.
</div>
]]></description>
				<content:encoded><![CDATA[<p>I&#8217;ll be attending <a href="http://www.percona.com/live/nyc-2012/">Percona Live NY 2012</a> next month, co-presenting a tutorial with Marcos Albe (also from Percona) and a couple of sessions with Marcos and Ryan Lowe (of Square).</p>
<p>On Monday, Marcos and I will be doing a <a href="http://www.percona.com/live/nyc-2012/sessions/managing-mysql-percona-toolkit">hands on tutorial</a> on Percona Toolkit, and on Tuesday, Ryan will join us for a session on <a href="http://www.percona.com/live/nyc-2012/sessions/percona-toolkit-recipes">Percona Toolkit</a> and one on MySQL performance optimization on <a href="http://www.percona.com/live/nyc-2012/sessions/performance-tuning-amazon-ec2-and-rds">EC2 and RDS</a>.</p>
<p>&nbsp;</p>
<div class='yarpp-related-rss yarpp-related-none'>
<p>No related posts.</p>
</div>
]]></content:encoded>
			<wfw:commentRss>http://fernandoipar.com/2012/09/10/speaking-at-percona-live-ny-2012/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Piping data to multiple processes</title>
		<link>http://fernandoipar.com/2011/03/10/piping-data-to-multiple-processes/</link>
		<comments>http://fernandoipar.com/2011/03/10/piping-data-to-multiple-processes/#comments</comments>
		<pubDate>Thu, 10 Mar 2011 05:00:06 +0000</pubDate>
		<dc:creator>fernando</dc:creator>
				<category><![CDATA[Programming]]></category>
		<category><![CDATA[shell]]></category>

		<guid isPermaLink="false">http://fernandoipar.com/?p=216</guid>
		<description><![CDATA[Here&#8217;s a simple shell script to stream data to multiple processes. It has many applications, but the reason I wrote it is to stream the same data to multiple netcat processes on remote machines. Here&#8217;s the code: #!/bin/bash usage() { cat &#60;&#38;2 usage : multi-fifo target0 [target1 [target2 [...]]] Where each targetN is a program [...]<div class='yarpp-related-rss'>

Related posts:<ol>
<li><a href='http://fernandoipar.com/2009/08/14/generating-data-with-dbmonster/' rel='bookmark' title='Generating data with dbmonster'>Generating data with dbmonster</a> <small>In my last post I included some sample data which...</small></li>
<li><a href='http://fernandoipar.com/2009/03/09/using-the-enum-data-type-to-increase-performance/' rel='bookmark' title='Using the ENUM data type to increase performance'>Using the ENUM data type to increase performance</a> <small>While going through the DATA TYPES section of the Certification...</small></li>
<li><a href='http://fernandoipar.com/2009/01/12/running-commands-from-the-shell-with-a-timeout-pt-2/' rel='bookmark' title='Running commands from the shell with a timeout (pt 2)'>Running commands from the shell with a timeout (pt 2)</a> <small>Here&#8217;s an improved version of the safecmd script. This one...</small></li>
</ol>

Related posts brought to you by <a href='http://yarpp.org'>Yet Another Related Posts Plugin</a>.
</div>
]]></description>
				<content:encoded><![CDATA[<p>Here&#8217;s a simple shell script to stream data to multiple processes. It has many applications, but the reason I wrote it is to stream the same data to multiple netcat processes on remote machines.</p>
<p>Here&#8217;s the code:</p>
<pre>
#!/bin/bash</code>

usage()
{
cat &lt;&amp;2

usage : multi-fifo target0 [target1 [target2 [...]]]

Where each targetN is a program you want to send the input multi-fifo receives

EOF

}

[ $# -eq 0 ] &amp;&amp; usage &amp;&amp; exit 1

i=0
pipeline="tee /dev/null"
while [ -n "$1" ]; do
target=$1
fifo=/tmp/multi-fifo-$$.$i
mkfifo $fifo
eval "$target &lt; $fifo" &amp;
pipeline="$pipeline | tee $fifo"
i=$((i+1))
shift
done

eval "cat | $pipeline"

j=0
while [ $j -lt $i ]; do
rm -f /tmp/multi-fifo-$$.$j
j=$((j+1))
done
</pre>
<p>Usage is pretty straightforward:</p>
<p><code><br />
tar cjvf - . | ./multi-fifo "nc host1 9999" "nc host2 9999" &gt;/dev/null<br />
</code></p>
<p>This way, the script will create a fifo for each netcat client, then send it&#8217;s stdin to a pipeline that writes to both fifos using tee. Useful for example if you want to stream a backup to multiple destination servers for cloning.</p>
<div class='yarpp-related-rss'>
<p>Related posts:<ol>
<li><a href='http://fernandoipar.com/2009/08/14/generating-data-with-dbmonster/' rel='bookmark' title='Generating data with dbmonster'>Generating data with dbmonster</a> <small>In my last post I included some sample data which...</small></li>
<li><a href='http://fernandoipar.com/2009/03/09/using-the-enum-data-type-to-increase-performance/' rel='bookmark' title='Using the ENUM data type to increase performance'>Using the ENUM data type to increase performance</a> <small>While going through the DATA TYPES section of the Certification...</small></li>
<li><a href='http://fernandoipar.com/2009/01/12/running-commands-from-the-shell-with-a-timeout-pt-2/' rel='bookmark' title='Running commands from the shell with a timeout (pt 2)'>Running commands from the shell with a timeout (pt 2)</a> <small>Here&#8217;s an improved version of the safecmd script. This one...</small></li>
</ol></p>
<p>Related posts brought to you by <a href='http://yarpp.org'>Yet Another Related Posts Plugin</a>.</p>
</div>
]]></content:encoded>
			<wfw:commentRss>http://fernandoipar.com/2011/03/10/piping-data-to-multiple-processes/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>High Availability MySQL Cookbook review</title>
		<link>http://fernandoipar.com/2010/07/07/high-availability-mysql-cookbook-review/</link>
		<comments>http://fernandoipar.com/2010/07/07/high-availability-mysql-cookbook-review/#comments</comments>
		<pubDate>Wed, 07 Jul 2010 04:03:17 +0000</pubDate>
		<dc:creator>fernando</dc:creator>
				<category><![CDATA[MySQL]]></category>
		<category><![CDATA[Availability]]></category>
		<category><![CDATA[Review]]></category>

		<guid isPermaLink="false">http://fernandoipar.com/?p=204</guid>
		<description><![CDATA[High Availability MySQL Cookbook (Alex Davies, Packt Publishing) presents different approaches to achieve high availability with MySQL. The bulk of the book is dedicated to MySQL Cluster, with shorter sections on: MySQL replication shared storage block level replication performance tuning The recipes are clear and well explained, based on a CentOS distribution, and it seems [...]<div class='yarpp-related-rss'>

Related posts:<ol>
<li><a href='http://fernandoipar.com/2009/03/06/mysql-certification-self-study/' rel='bookmark' title='MySQL Certification self study'>MySQL Certification self study</a> <small>I&#8217;m taking the MySQL Certification exams soon, and while I&#8217;d...</small></li>
<li><a href='http://fernandoipar.com/2009/04/02/using-mysql-sandbox-for-testing/' rel='bookmark' title='Using MySQL sandbox for testing'>Using MySQL sandbox for testing</a> <small>MySQL Sandbox is a great tool for quickly deploying test...</small></li>
<li><a href='http://fernandoipar.com/2009/08/12/indexing-text-columns-in-mysql/' rel='bookmark' title='Indexing text columns in MySQL'>Indexing text columns in MySQL</a> <small>This time, I&#8217;m talking about indexes for string typed columns....</small></li>
</ol>

Related posts brought to you by <a href='http://yarpp.org'>Yet Another Related Posts Plugin</a>.
</div>
]]></description>
				<content:encoded><![CDATA[<p><a href="https://www.packtpub.com/high-availability-mysql-cookbook/book?utm_source=fernandoipar.com&amp;utm_medium=bookrev&amp;utm_content=blog&amp;utm_campaign=mdb_003272">High Availability MySQL Cookbook</a> (Alex Davies, Packt Publishing) presents different approaches to achieve high availability with MySQL.</p>
<p>The bulk of the book is dedicated to MySQL Cluster, with shorter sections on:</p>
<ul>
<li>MySQL replication</li>
<li>shared storage</li>
<li>block level replication</li>
<li>performance tuning</li>
</ul>
<p>The recipes are clear and well explained, based on a CentOS distribution, and it seems any technically skilled person could follow them without issues.</p>
<p>What&#8217;s lacking are some design aspects. Based on this material, one probably wouldn&#8217;t be able to decide what the best high availability architecture is for a given problem. Actually, one may even be tempted to think MySQL Cluster is the best fit for most scenarios, given the percentage of the book dedicated to it. Nevertheless, there&#8217;s a section about Cluster limitations and potential problems, so the cautious reader won&#8217;t be tempted to choose this solution for every new project.</p>
<p>I also found that some important considerations regarding replication are missing.<br />
The reader is instructed to rely on Seconds_Behind_Master alone to monitor replication, and there&#8217;s no mention to the situations that can cause as slave to go out of sync, nor of a process to fix this problem.</p>
<p>However, this book is a useful addition to any MySQL practitioner&#8217;s library, provided you don&#8217;t expect to rely only on it to design and deploy your MySQL based highly available services.</p>
<div class='yarpp-related-rss'>
<p>Related posts:<ol>
<li><a href='http://fernandoipar.com/2009/03/06/mysql-certification-self-study/' rel='bookmark' title='MySQL Certification self study'>MySQL Certification self study</a> <small>I&#8217;m taking the MySQL Certification exams soon, and while I&#8217;d...</small></li>
<li><a href='http://fernandoipar.com/2009/04/02/using-mysql-sandbox-for-testing/' rel='bookmark' title='Using MySQL sandbox for testing'>Using MySQL sandbox for testing</a> <small>MySQL Sandbox is a great tool for quickly deploying test...</small></li>
<li><a href='http://fernandoipar.com/2009/08/12/indexing-text-columns-in-mysql/' rel='bookmark' title='Indexing text columns in MySQL'>Indexing text columns in MySQL</a> <small>This time, I&#8217;m talking about indexes for string typed columns....</small></li>
</ol></p>
<p>Related posts brought to you by <a href='http://yarpp.org'>Yet Another Related Posts Plugin</a>.</p>
</div>
]]></content:encoded>
			<wfw:commentRss>http://fernandoipar.com/2010/07/07/high-availability-mysql-cookbook-review/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>how to prevent explain from executing subqueries</title>
		<link>http://fernandoipar.com/2010/01/26/how-to-prevent-explain-from-executing-subqueries/</link>
		<comments>http://fernandoipar.com/2010/01/26/how-to-prevent-explain-from-executing-subqueries/#comments</comments>
		<pubDate>Tue, 26 Jan 2010 13:43:03 +0000</pubDate>
		<dc:creator>fernando</dc:creator>
				<category><![CDATA[MySQL]]></category>
		<category><![CDATA[explain]]></category>

		<guid isPermaLink="false">http://fernandoipar.com/?p=194</guid>
		<description><![CDATA[Here&#8217;s a quick tip for using explain: You may know this already, but mysql will actually execute some subqueries when you invoke explain.  Here&#8217;s an example: mysql&#62; explain select id from projects where id = (select max(id) from projects where name like 'en%'); +----+-------------+----------+-------+---------------+---------+---------+-------+-------+-------------+ &#124; id &#124; select_type &#124; table &#124; type &#124; possible_keys &#124; [...]<div class='yarpp-related-rss'>

Related posts:<ol>
<li><a href='http://fernandoipar.com/2009/03/09/using-the-enum-data-type-to-increase-performance/' rel='bookmark' title='Using the ENUM data type to increase performance'>Using the ENUM data type to increase performance</a> <small>While going through the DATA TYPES section of the Certification...</small></li>
</ol>

Related posts brought to you by <a href='http://yarpp.org'>Yet Another Related Posts Plugin</a>.
</div>
]]></description>
				<content:encoded><![CDATA[<p>Here&#8217;s a quick tip for using explain:</p>
<p>You may know this already, but mysql will actually execute some subqueries when you invoke explain.  Here&#8217;s an example:</p>
<pre>mysql&gt; explain select id from projects where id = (select max(id) from projects where name like 'en%');
+----+-------------+----------+-------+---------------+---------+---------+-------+-------+-------------+
| id | select_type | table    | type  | possible_keys | key     | key_len | ref   | rows  | Extra       |
+----+-------------+----------+-------+---------------+---------+---------+-------+-------+-------------+
|  1 | PRIMARY     | projects | const | PRIMARY       | PRIMARY | 4       | const |     1 | Using index |
|  2 | SUBQUERY    | projects | ALL   | NULL          | NULL    | NULL    | NULL  | 67922 | Using where |
+----+-------------+----------+-------+---------------+---------+---------+-------+-------+-------------+
2 rows in set (0.11 sec)
</pre>
<p>Take a look at the execution time (I choose an intentionally poorly executing query for my little dataset).<br />
Here&#8217;s explain when it&#8217;s not executing:</p>
<pre>mysql&gt; explain select max(id) from projects where name like 'en%';
+----+-------------+----------+------+---------------+------+---------+------+-------+-------------+
| id | select_type | table    | type | possible_keys | key  | key_len | ref  | rows  | Extra       |
+----+-------------+----------+------+---------------+------+---------+------+-------+-------------+
|  1 | SIMPLE      | projects | ALL  | NULL          | NULL | NULL    | NULL | 69513 | Using where |
+----+-------------+----------+------+---------------+------+---------+------+-------+-------------+
1 row in set (0.00 sec)
</pre>
<p>If you want to work around this to prevent trouble on a production server (albeit, not getting the output from explain), you can do this:</p>
<pre>mysql&gt; set session max_join_size=1;
Query OK, 0 rows affected (0.00 sec)

mysql&gt; explain select id from projects where id = (select max(id) from projects where name like 'en%');
ERROR 1104 (42000): The SELECT would examine more than MAX_JOIN_SIZE rows; check your WHERE and use SET SQL_BIG_SELECTS=1 or SET SQL_MAX_JOIN_SIZE=# if the SELECT is okay
</pre>
<div class='yarpp-related-rss'>
<p>Related posts:<ol>
<li><a href='http://fernandoipar.com/2009/03/09/using-the-enum-data-type-to-increase-performance/' rel='bookmark' title='Using the ENUM data type to increase performance'>Using the ENUM data type to increase performance</a> <small>While going through the DATA TYPES section of the Certification...</small></li>
</ol></p>
<p>Related posts brought to you by <a href='http://yarpp.org'>Yet Another Related Posts Plugin</a>.</p>
</div>
]]></content:encoded>
			<wfw:commentRss>http://fernandoipar.com/2010/01/26/how-to-prevent-explain-from-executing-subqueries/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Generating data with dbmonster</title>
		<link>http://fernandoipar.com/2009/08/14/generating-data-with-dbmonster/</link>
		<comments>http://fernandoipar.com/2009/08/14/generating-data-with-dbmonster/#comments</comments>
		<pubDate>Fri, 14 Aug 2009 20:02:56 +0000</pubDate>
		<dc:creator>fernando</dc:creator>
				<category><![CDATA[MySQL]]></category>
		<category><![CDATA[Java]]></category>
		<category><![CDATA[Testing]]></category>

		<guid isPermaLink="false">http://fernandoipar.com/?p=182</guid>
		<description><![CDATA[In my last post I included some sample data which was useful for playing around with queries (once I published it, I realized it made my post look like some form of keyword stuffing, fortunately I don&#8217;t use adsense on my site so I hope I&#8217;m free of any suspicion ). That sample data was [...]<div class='yarpp-related-rss'>

Related posts:<ol>
<li><a href='http://fernandoipar.com/2009/02/04/generating-random-salts-from-bash/' rel='bookmark' title='Generating random salts from bash'>Generating random salts from bash</a> <small>From the &#8216;just because it can be done&#8217; column, here...</small></li>
<li><a href='http://fernandoipar.com/2009/03/09/using-the-enum-data-type-to-increase-performance/' rel='bookmark' title='Using the ENUM data type to increase performance'>Using the ENUM data type to increase performance</a> <small>While going through the DATA TYPES section of the Certification...</small></li>
</ol>

Related posts brought to you by <a href='http://yarpp.org'>Yet Another Related Posts Plugin</a>.
</div>
]]></description>
				<content:encoded><![CDATA[<p>
In my <a href="http://fernandoipar.com/2009/08/12/indexing-text-columns-in-mysql/">last post</a> I included some sample data which was useful for playing around with queries (once I published it,<br />
I realized it made my post look like some form of keyword stuffing, fortunately I don&#8217;t use adsense on my site so I hope I&#8217;m free of any suspicion <img src='http://fernandoipar.com/wp-includes/images/smilies/icon_biggrin.gif' alt=':D' class='wp-smiley' /> ).
</p>
<p>
That sample data was generated with dbmonster, a nice tool I discovered recently which comes in handy when you need to populate a database table to test your queries, the engine, schema, etc.
</p>
<p>
In a nutshell, here are the strengths I like about it:</p>
<ul>
<li>It&#8217;s written in Java, so it&#8217;ll feed any database with a JDBC driver, and AFAIK, that&#8217;s just about any RDBMS you might need to use</li>
<li>It&#8217;s not too slow (it&#8217;s actually pretty fast if you don&#8217;t go crazy on long text based datatypes)</li>
<li>It includes several data generators, and being written in Java, rolling out my own should be easy. </li>
</ul>
<p>As for the weaknesses:</p>
<ul>
<li>It can be unbearably slow with long text columns</li>
<li>The generated text data might not represent a very real installation (again, look at my previous post for a quick sample). Still, the fact that you can easily write and plug your own generator(s) is enough to give kudos to the developers for a good architecture. Even if a feature isn&#8217;t available now, it&#8217;s painless to add in the future</li>
<li><del>It has no way to manage relational integrity, hence, it doesn&#8217;t really feed RDBMS, just JDBC enabled databases</del> (yes it does, please see the comments)</li>
</ul>
<p>
<del>My last point means that, if you have table A and table B, both related through a foreign key constraint, dbmonster doesn&#8217;t provide a way for you to generate keys for table A and then reuse these keys<br />
while inserting the foreign keys in table B. Not that I&#8217;ve found so far, at least.</del>
</p>
<p>
Related with this limitation, tables are generated one at the time. Still, if you need a tool to populate tables, even very large ones, with random values, heck, up until a few days ago, I was doing this from<br />
bash or python, so it&#8217;s definitely an improvement. <del>I&#8217;ve discussed this with colleagues and some proprietary tools are either available or in the works that would handle referential integrity while<br />
generating data for database testing. If anyone knows an Open Source one that does, it&#8217;d be great. Otherwise, that would sure make one hell of a good project <img src='http://fernandoipar.com/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' /> </del>
</p>
<p>
Ok, so enough rant, here&#8217;s some data: You get it from <a href="http://dbmonster.kernelpanic.pl/">here</a>.
</p>
<p>
Start off by creating a dbmonster.properties file. Here&#8217;s a sample:
</p>
<pre>
dbmonster.jdbc.driver=com.mysql.jdbc.Driver
dbmonster.jdbc.url=jdbc:mysql://localhost/sample_db
dbmonster.jdbc.username=mysql_user
dbmonster.jdbc.password=mysql_password

# for Oracle and other schema enabled databases
#dbmonster.jdbc.schema=schema_name

# maximal number of (re)tries
dbmonster.max-tries=50

# default rows number for SchemaGrabber
dbmonster.rows=1000

# progres monitor class
dbmonster.progress.monitor=pl.kernelpanic.dbmonster.ProgressMonitorAdapter
</pre>
<p>
Dbmonster is now ready to reach your database.<br />
Before it&#8217;s ready to feed it, though, you need to create an xml file (ah, yes, the plague of the Java world, and I&#8217;ve been a Java guy for most of my working years&#8230;). Call it what you want,<br />
being a consistent kind type of man, I&#8217;ve been using <schema-name>.xml. Here&#8217;s my sample_db.xml file:
</p>
<pre>
&lt;?xml version="1.0"?&gt;
&lt;!DOCTYPE dbmonster-schema PUBLIC "-//kernelpanic.pl//DBMonster Database Schema DTD 1.1//EN" "http://dbmonster.kernelpanic.pl/dtd/dbmonster-schema-1.1.dtd"&gt;
&lt;dbmonster-schema&gt;
  &lt;name&gt;Sample DB&lt;/name&gt;
  &lt;table name="jobs" rows="150000"&gt;
    &lt;key databaseDefault="true"&gt;
      &lt;generator type="pl.kernelpanic.dbmonster.generator.MaxKeyGenerator"&gt;
        &lt;property name="columnName" value="id"/&gt;
      &lt;/generator&gt;
    &lt;/key&gt;
    &lt;column name="title" databaseDefault="false"&gt;
      &lt;generator type="pl.kernelpanic.dbmonster.generator.StringGenerator"&gt;
        &lt;property name="allowSpaces" value="true"/&gt;
        &lt;property name="excludeChars" value=""/&gt;
        &lt;property name="maxLength" value="250"/&gt;
        &lt;property name="minLength" value="15"/&gt;
        &lt;property name="nulls" value="0"/&gt;
      &lt;/generator&gt;
    &lt;/column&gt;
    &lt;column name="link" databaseDefault="false"&gt;
      &lt;generator type="pl.kernelpanic.dbmonster.generator.StringGenerator"&gt;
        &lt;property name="allowSpaces" value="true"/&gt;
        &lt;property name="excludeChars" value=""/&gt;
        &lt;property name="maxLength" value="65500"/&gt;
        &lt;property name="minLength" value="150"/&gt;
        &lt;property name="nulls" value="0"/&gt;
      &lt;/generator&gt;
    &lt;/column&gt;
    &lt;column name="description" databaseDefault="false"&gt;
      &lt;generator type="pl.kernelpanic.dbmonster.generator.StringGenerator"&gt;
        &lt;property name="allowSpaces" value="true"/&gt;
        &lt;property name="excludeChars" value=""/&gt;
        &lt;property name="maxLength" value="65500"/&gt;
        &lt;property name="minLength" value="150"/&gt;
        &lt;property name="nulls" value="0"/&gt;
      &lt;/generator&gt;
    &lt;/column&gt;
    &lt;column name="city" databaseDefault="false"&gt;
      &lt;generator type="pl.kernelpanic.dbmonster.generator.StringGenerator"&gt;
        &lt;property name="allowSpaces" value="true"/&gt;
        &lt;property name="excludeChars" value=""/&gt;
        &lt;property name="maxLength" value="250"/&gt;
        &lt;property name="minLength" value="15"/&gt;
        &lt;property name="nulls" value="0"/&gt;
      &lt;/generator&gt;
    &lt;/column&gt;
    &lt;column name="postdate" databaseDefault="false"&gt;
      &lt;generator type="pl.kernelpanic.dbmonster.generator.DateTimeGenerator"&gt;
        &lt;property name="nulls" value="0"/&gt;
      &lt;/generator&gt;
    &lt;/column&gt;
    &lt;column name="company_id" databaseDefault="false"&gt;
      &lt;generator type="pl.kernelpanic.dbmonster.generator.NumberGenerator"&gt;
        &lt;property name="nulls" value="0"/&gt;
      &lt;/generator&gt;
    &lt;/column&gt;
    &lt;column name="country_id" databaseDefault="false"&gt;
      &lt;generator type="pl.kernelpanic.dbmonster.generator.NumberGenerator"&gt;
        &lt;property name="nulls" value="0"/&gt;
      &lt;/generator&gt;
    &lt;/column&gt;
  &lt;/table&gt;
&lt;/dbmonster-schema&gt;
</pre>
<p>
The file is pretty self explanatory. However, here are some pointers:</p>
<ul>
<li>You have a table tag for every generated table</li>
<li>A column tag for every column of every table (duh)</li>
<li>Column&#8217;s have generators. These are the Java classes that actually generate the value to insert into this column. Notice how you can specify the databasedefault=&#8221;true&#8221; attribute, which makes dbmonster omit generation for this column (good for auto increment columns / postgres sequences</li>
</ul>
<p>
Now let&#8217;s run it:
</p>
<pre>
export CLASSPATH=dbmonster*.jar:mysql*jar

java pl.kernelpanic.dbmonster.Launcher -s sample_db.xml

fipar@telecaster:~/soft/dbmonster-core-1.0.3$ java -classpath mysql-connector-java-5.1.6-bin.jar:dbmonster-core-1.0.3.jar pl.kernelpanic.dbmonster.Launcher -s sample_db.xml
2009-08-05 13:43:27,244 INFO  DBMonster - Let's feed this hungry database.
2009-08-05 13:43:27,697 INFO  DBCPConnectionProvider - Today we are feeding: MySQL 5.0.75-0ubuntu10.2-log
2009-08-05 13:43:27,783 INFO  Schema - Generating schema <Sample DB>.
2009-08-05 13:43:27,783 INFO  Table - Generating table <tasks>.
2009-08-05 13:44:18,749 INFO  Table - Generation of table <tasks> finished.
2009-08-05 13:44:18,750 INFO  Schema - Generation of schema <Sample DB> finished.
2009-08-05 13:44:18,750 INFO  DBMonster - Finished in 51 sec. 507 ms.
fipar@telecaster:~/soft/dbmonster-core-1.0.3$ mysql -p -e 'select count(*) from tasks' sample_db
Enter password:
+----------+
| count(*) |
+----------+
|    80000 |
+----------+


fipar@telecaster:~/soft/dbmonster-core-1.0.3$ java -cp dbmonster-core-1.0.3.jar:mysql-connector-java-5.1.6-bin.jar pl.kernelpanic.dbmonster.Launcher -s sample_db.xml
2009-08-06 10:27:24,747 INFO  DBMonster - Let's feed this hungry database.
2009-08-06 10:27:25,336 INFO  DBCPConnectionProvider - Today we are feeding: MySQL 5.0.75-0ubuntu10.2-log
2009-08-06 10:27:25,490 INFO  Schema - Generating schema <Sample DB>.
2009-08-06 10:27:25,490 INFO  Table - Generating table <tasks>.
2009-08-06 10:37:14,284 INFO  Table - Generation of table <tasks> finished.
2009-08-06 10:37:14,284 INFO  Schema - Generation of schema <Sample DB> finished.
2009-08-06 10:37:14,285 INFO  DBMonster - Finished in 9 min. 49 sec. 539 ms.
fipar@telecaster:~/soft/dbmonster-core-1.0.3$ mysql -p -e 'select count(*) from tasks' sample_db
Enter password:
+----------+
| count(*) |
+----------+
|   150000 |
+----------+
</pre>
<p>
Besides the obvious difference in the number of rows, the second generation had larger text values (char(250) vs char(40)). You can see how that affected generation time. It should have finished  (proportionally) in<br />
about 1:50, yet it took dbmonser almost 10 minutes to generate this data.
</p>
<p>
In conclusion, depending on the size of the database you need to generate, and the use you intend for it (forget testing a huge schema with referential constraints), dbmonster can certainly aid<br />
you in stress testing any database engine.</p>
<div class='yarpp-related-rss'>
<p>Related posts:<ol>
<li><a href='http://fernandoipar.com/2009/02/04/generating-random-salts-from-bash/' rel='bookmark' title='Generating random salts from bash'>Generating random salts from bash</a> <small>From the &#8216;just because it can be done&#8217; column, here...</small></li>
<li><a href='http://fernandoipar.com/2009/03/09/using-the-enum-data-type-to-increase-performance/' rel='bookmark' title='Using the ENUM data type to increase performance'>Using the ENUM data type to increase performance</a> <small>While going through the DATA TYPES section of the Certification...</small></li>
</ol></p>
<p>Related posts brought to you by <a href='http://yarpp.org'>Yet Another Related Posts Plugin</a>.</p>
</div>
]]></content:encoded>
			<wfw:commentRss>http://fernandoipar.com/2009/08/14/generating-data-with-dbmonster/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
		<item>
		<title>Indexing text columns in MySQL</title>
		<link>http://fernandoipar.com/2009/08/12/indexing-text-columns-in-mysql/</link>
		<comments>http://fernandoipar.com/2009/08/12/indexing-text-columns-in-mysql/#comments</comments>
		<pubDate>Wed, 12 Aug 2009 22:07:28 +0000</pubDate>
		<dc:creator>fernando</dc:creator>
				<category><![CDATA[MySQL]]></category>
		<category><![CDATA[indexes]]></category>
		<category><![CDATA[optimization]]></category>
		<category><![CDATA[performance]]></category>

		<guid isPermaLink="false">http://fernandoipar.com/?p=178</guid>
		<description><![CDATA[This time, I&#8217;m talking about indexes for string typed columns. In particular, I&#8217;ll show a procedure I find useful while looking for good index length values for these columns. I&#8217;ll use a sample table called people. Here&#8217;s what it looks like: mysql&#62; desc people; +&#8212;&#8212;&#8212;&#8212;+&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;+&#8212;&#8212;+&#8212;&#8211;+&#8212;&#8212;&#8212;+&#8212;&#8212;&#8212;&#8212;&#8212;-+ &#124; Field &#124; Type &#124; Null &#124; Key &#124; Default [...]<div class='yarpp-related-rss'>

Related posts:<ol>
<li><a href='http://fernandoipar.com/2009/04/06/using-mysql-proxy-to-benchmark-query-performance/' rel='bookmark' title='Using MySQL Proxy to benchmark query performance'>Using MySQL Proxy to benchmark query performance</a> <small>By transparently sitting between client and server on each request,...</small></li>
<li><a href='http://fernandoipar.com/2009/04/18/making-use-of-procedure-analyse/' rel='bookmark' title='Making use of procedure analyse()'>Making use of procedure analyse()</a> <small>SELECT Field0[,Field1,Field2,...] FROM TABLE PROCEDURE ANALYSE() is a nice tool...</small></li>
<li><a href='http://fernandoipar.com/2009/03/09/using-the-enum-data-type-to-increase-performance/' rel='bookmark' title='Using the ENUM data type to increase performance'>Using the ENUM data type to increase performance</a> <small>While going through the DATA TYPES section of the Certification...</small></li>
</ol>

Related posts brought to you by <a href='http://yarpp.org'>Yet Another Related Posts Plugin</a>.
</div>
]]></description>
				<content:encoded><![CDATA[<p><P>This time, I&#8217;m talking about indexes for string typed columns. In particular, I&#8217;ll show a procedure I find useful while looking for good index length values for these columns.<br />
</P><br />
<P>I&#8217;ll use a sample table called people.<br />
</P><br />
<P>Here&#8217;s what it looks like:<br />
</P><br />
<PRE>mysql&gt; desc people;<br />
+&#8212;&#8212;&#8212;&#8212;+&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;+&#8212;&#8212;+&#8212;&#8211;+&#8212;&#8212;&#8212;+&#8212;&#8212;&#8212;&#8212;&#8212;-+<br />
| Field      | Type             | Null | Key | Default | Extra          |<br />
+&#8212;&#8212;&#8212;&#8212;+&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;+&#8212;&#8212;+&#8212;&#8211;+&#8212;&#8212;&#8212;+&#8212;&#8212;&#8212;&#8212;&#8212;-+<br />
| id         | int(11) unsigned | NO   | PRI | NULL    | auto_increment |<br />
| title      | varchar(250)     | NO   |     | NULL    |                |<br />
| city       | varchar(250)     | NO   |     | NULL    |                |<br />
| occupation | varchar(250)     | NO   |     | NULL    |                |<br />
+&#8212;&#8212;&#8212;&#8212;+&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;+&#8212;&#8212;+&#8212;&#8211;+&#8212;&#8212;&#8212;+&#8212;&#8212;&#8212;&#8212;&#8212;-+<br />
4 rows in set (0.00 sec)</p>
<p>mysql&gt; select count(*) from people;<br />
+&#8212;&#8212;&#8212;-+<br />
| count(*) |<br />
+&#8212;&#8212;&#8212;-+<br />
|   150000 |<br />
+&#8212;&#8212;&#8212;-+<br />
1 row in set (0.00 sec)</p>
<p>mysql&gt; </PRE><P><br />
We&#8217;ll start by using procedure analyse to get some useful information<br />
about our data. Unless you know some fields are good candidates for<br />
use with the ENUM datatype, invoke procedure analyse with arguments<br />
(0,0) in order to prevent mysql from suggesting huge ENUMs for string<br />
columns.<br />
</P><br />
<PRE>mysql&gt; select * from people procedure analyse(0,0)\G<br />
*************************** 1. row ***************************<br />
             Field_name: test.people.id<br />
              Min_value: 1<br />
              Max_value: 150000<br />
             Min_length: 1<br />
             Max_length: 6<br />
       Empties_or_zeros: 0<br />
                  Nulls: 0<br />
Avg_value_or_avg_length: 75000.5000<br />
                    Std: 87258.1632<br />
      Optimal_fieldtype: MEDIUMINT(6) UNSIGNED NOT NULL<br />
*************************** 2. row ***************************<br />
             Field_name: test.people.title<br />
              Min_value: aback exclaims stopgap&#8217;s chapel&#8217;s tanked claps snowshoe cigarette correlates extras laster cluc<br />
              Max_value: Zulus colossally dictate cleft&#8217;s enchanter del<br />
             Min_length: 40<br />
             Max_length: 150<br />
       Empties_or_zeros: 0<br />
                  Nulls: 0<br />
Avg_value_or_avg_length: 95.0869<br />
                    Std: NULL<br />
      Optimal_fieldtype: TINYTEXT NOT NULL<br />
*************************** 3. row ***************************<br />
             Field_name: test.people.city<br />
              Min_value: aback ascertaining unw<br />
              Max_value: Zulus imprisonments veiner a<br />
             Min_length: 5<br />
             Max_length: 30<br />
       Empties_or_zeros: 0<br />
                  Nulls: 0<br />
Avg_value_or_avg_length: 17.4861<br />
                    Std: NULL<br />
      Optimal_fieldtype: TINYTEXT NOT NULL<br />
*************************** 4. row ***************************<br />
             Field_name: test.people.occupation<br />
              Min_value:<br />
              Max_value:<br />
             Min_length: 0<br />
             Max_length: 0<br />
       Empties_or_zeros: 150000<br />
                  Nulls: 0<br />
Avg_value_or_avg_length: 0.0000<br />
                    Std: NULL<br />
      Optimal_fieldtype: CHAR(0) NOT NULL<br />
4 rows in set (0.19 sec)</p>
<p>mysql&gt; </PRE><P><br />
The id column is numeric, and is actually already indexed. We can&#8217;t<br />
see this right here (though I showed an example of how you can use<br />
the output of procedure analyse and extend it to suit your needs, in<br />
which I did include an &#8216;Indexed&#8217; column in the output), but we can<br />
see this, and gan more information from the table, with the following<br />
statement:<br />
</P><br />
<PRE>mysql&gt; show index from people;<br />
+&#8212;&#8212;&#8211;+&#8212;&#8212;&#8212;&#8212;+&#8212;&#8212;&#8212;-+&#8212;&#8212;&#8212;&#8212;&#8211;+&#8212;&#8212;&#8212;&#8212;-+&#8212;&#8212;&#8212;&#8211;+&#8212;&#8212;&#8212;&#8212;-+&#8212;&#8212;&#8212;-+&#8212;&#8212;&#8211;+&#8212;&#8212;+&#8212;&#8212;&#8212;&#8212;+&#8212;&#8212;&#8212;+<br />
| Table  | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment |<br />
+&#8212;&#8212;&#8211;+&#8212;&#8212;&#8212;&#8212;+&#8212;&#8212;&#8212;-+&#8212;&#8212;&#8212;&#8212;&#8211;+&#8212;&#8212;&#8212;&#8212;-+&#8212;&#8212;&#8212;&#8211;+&#8212;&#8212;&#8212;&#8212;-+&#8212;&#8212;&#8212;-+&#8212;&#8212;&#8211;+&#8212;&#8212;+&#8212;&#8212;&#8212;&#8212;+&#8212;&#8212;&#8212;+<br />
| people |          0 | PRIMARY  |            1 | id          | A         |      150000 |     NULL | NULL   |      | BTREE      |         |<br />
+&#8212;&#8212;&#8211;+&#8212;&#8212;&#8212;&#8212;+&#8212;&#8212;&#8212;-+&#8212;&#8212;&#8212;&#8212;&#8211;+&#8212;&#8212;&#8212;&#8212;-+&#8212;&#8212;&#8212;&#8211;+&#8212;&#8212;&#8212;&#8212;-+&#8212;&#8212;&#8212;-+&#8212;&#8212;&#8211;+&#8212;&#8212;+&#8212;&#8212;&#8212;&#8212;+&#8212;&#8212;&#8212;+<br />
1 row in set (0.00 sec)</p>
<p>mysql&gt;<br />
</PRE><P><br />
In order to create a good index for title, we can use the following query. I used 95 as a starting point, since it&#8217;s reported by procedure analyse as the average data length of this column.<br />
</P></p>
<pre>
mysql> select count(distinct(substr(title,1,95))) / count(distinct(title)) * 100 from people;
+--------------------------------------------------------------------+
| count(distinct(substr(title,1,95))) / count(distinct(title)) * 100 |
+--------------------------------------------------------------------+
|                                                           100.0000 | 
+--------------------------------------------------------------------+
1 row in set (1.42 sec)

mysql> 

</pre>
<p><P><br />
As you can see, with 95 chars, we can get an index that covers 100% of the rows (i.e., get distinct values for all of them). Still, it&#8217;s a big number. Using this query, we<br />
can begin to play a little bit with the index size, until we get to a good compromise between enough distinct values and an index that&#8217;s small enough to be processed fast and<br />
maybe even loaded into memory.<br />
</P></p>
<pre>
mysql> select count(distinct(substr(title,1,20))) / count(distinct(title)) * 100 from people;
+--------------------------------------------------------------------+
| count(distinct(substr(title,1,20))) / count(distinct(title)) * 100 |
+--------------------------------------------------------------------+
|                                                            99.9507 | 
+--------------------------------------------------------------------+
1 row in set (1.15 sec)

mysql> 
</pre>
<p><P><br />
As it turns out, my data set doesn&#8217;t require too many characters in order to be differentiated. Actually, I&#8217;m kind of cheating here, for a table this size, since I used a data generator to populate this tables, and it<br />
generated a lot of random text. Real world data would probably require a larger prefix in order to get such good differentiation. Anyway, let&#8217;s push it a little bit more.<br />
</P></p>
<pre>
mysql> select count(distinct(substr(title,1,15))) / count(distinct(title)) * 100 from people;
+--------------------------------------------------------------------+
| count(distinct(substr(title,1,15))) / count(distinct(title)) * 100 |
+--------------------------------------------------------------------+
|                                                            97.0787 | 
+--------------------------------------------------------------------+
1 row in set (1.18 sec)

mysql> select count(distinct(substr(title,1,14))) / count(distinct(title)) * 100 from people;
+--------------------------------------------------------------------+
| count(distinct(substr(title,1,14))) / count(distinct(title)) * 100 |
+--------------------------------------------------------------------+
|                                                            94.4247 | 
+--------------------------------------------------------------------+
1 row in set (1.13 sec)
</pre>
<p><P><br />
Here&#8217;s the turning point for me. A jumb between 94% and 97% of index coverage in just 1 character. So I&#8217;m sticking with 15. Let&#8217;s test this with some queries.<br />
</P></p>
<pre>
mysql> select title from people limit 40;
+----------------------------------------------------------------------------------------------------------------------------------------------------+
| title                                                                                                                                              |
+----------------------------------------------------------------------------------------------------------------------------------------------------+
| puffs war's bruises buckles attainably Warnock's discoverer degeneration plots admirably assimilates germane burlesquely ri                        | 
| arbitrariness MacDraw's carbonates suckers budget chronicler cur drabs untested Aryans imperial                                                    | 
| commender dozes distills blackbird's mend meta                                                                                                     | 
| gallons haying occupation's sculpt fittingness scores onwards recessed masculineness denominator's regulated boyfriend's                           | 
| authored metaphor derivatively matchmakers ratification railing advantageousness flossing twin's barbarously infinite retreat alloying tenting t   | 
| Africans determinateness enquired quivers replaces nowhere applicability negative alarms lacquerer shivered arachnid ulcer sil                     | 
| filter offerings unboundedness clearness enthusiast commandants blunted betide rusticated blacks helmet's squabbles tasked Beethoven contro        | 
| thirties oftener tunnel anguish attainable formulat                                                                                                | 
| grotesquely fallacious inessential fain sanctioned too amplifi                                                                                     | 
| consort rapes deeply marker patterns compacted plumbe                                                                                              | 
| nasally combings searcher's pathname's bolts retrospective aroused squintingly boyish singers recompiles Austral                                   | 
| purpled draggingly nobody's luckier spinning goddess oscilloscopes aimer                                                                           | 
| aphasia reconverts shams entangle placer metaphysical visited turret nai                                                                           | 
| violation bituminous unweighed darkness cackles consonant foully fisted loci relishes burn m                                                       | 
| unsuffixed overdose humbles corpses fashions slashingly quietude delighte                                                                          | 
| sheller hypocrisy falser productions shied cube breed childishness requested pads redoub                                                           | 
| broils aorta refund sinker cankering reawakens portrayed resolving bard's stand ejects inhabitant's tittering genders proposition                  | 
| cyclone's glorification unrestricted delicately inhibitive waterway wardrobes excommunicated laugher                                               | 
| poppies heroine's gunner swollen reticle vertebrate's shrank unreliabilities infractions pretentious angstroms relations highness feasibil         | 
| ampoule clustering intermediaries honer ree                                                                                                        | 
| creature's transferals tidal unsigned stitching ought coerces visa girdling porn janitors parer song's croaked ta                                  | 
| hammer amalgamating stunting feasibility hopefulness oilier spraying frets pinks                                                                   | 
| comelier tomorrow's cowboy chalked lewdness cordial supering rut's neurally blindingly mute drowsiest gives in                                     | 
| slides aqueduct glazers abolition dangers sultry raid prominence hedges walks toppled defenders autocrat                                           | 
| theoretic thumps scum's photos bootlegged enveloper sallying populations disruptions inaugurate conclu                                             | 
| annotated bibliographies lichen user's bluebird's subproofs unendurably recollection's crumple                                                     | 
| sergeant outlets pinion reducer wiling impinge apes insaneness dose automatics lighthouse's cursory sleepily web's interruptions superin           | 
| tautened skylarks toad's seminar's archangel's sarcasm shipwrecks indeed incliner tying waterf                                                     | 
| chaotic censuses intimacy custodian's extendedly womb's safeguarding desire                                                                        | 
| abusiveness skippered inspirer enunciation taper memory's clearly guardianship inputed m                                                           | 
| firing anaphora subsegments turbulence affectedness refractory unsprayed chapter's volumes undramati                                               | 
| chef reception's glens budged budge arson assistance disagreeableness fodder garnering boated skater heroine's pamphle                             | 
| prophetic spilling asper petter's constable's classic ices teethes mails office's sordidness cylindered chaffing bivouac skeptics shuttering quash | 
| hoppers iciest sharer dietitian dictionaries frac                                                                                                  | 
| racketeer Ellen amounts origin's abstractions render vanish pantries retrieve Maxtor unprojected antithes                                          | 
| hinter arrangers dialogue imputing droppers shelver boyish demonstrator braving submitting operated carbonate protruding creasing prospecto        | 
| electrify garment linked discernible transceiver's ungrounded telegrapher uncoated                                                                 | 
| badly unaffected vex taming affiliation strings detracts grandpas girded cafeteria approving ideology froze underlinings assass                    | 
| MHz articulate draws transpires rubbling swarthier reeler bagged snug assisted consonant's settl                                                   | 
| taxi's wearying espies Anglican's intangibly fluent jugs liveth pride ex                                                                           | 
+----------------------------------------------------------------------------------------------------------------------------------------------------+
40 rows in set (0.00 sec)

mysql> 
</pre>
<p><P><br />
That should give you an idea of what type of needles we&#8217;re looking for <img src='http://fernandoipar.com/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' /><br />
</P></p>
<pre>
mysql> select * from people where title like 'taxi%';
+--------+-----------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------+------------+
| id     | title                                                                                                                                               | city                           | occupation |
+--------+-----------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------+------------+
|     40 | taxi's wearying espies Anglican's intangibly fluent jugs liveth pride ex                                                                            | rotation tri                   |            | 
|   5736 | taxi's allegorically accounting manipulatory cautiousness computational promoter wool reproo                                                        | islan                          |            | 
|   6967 | taxis sprawls unblushing rude put absorbs reproducibilities crumblier kid                                                                           | DeMorgan overhe                |            | 
|   8388 | taxi catsup ornament transformer widener syndicates dismount pop t                                                                                  | exhibition's manages hedgehog' |            | 
|  10418 | taxingly eligibility whichever meditation corrosion unluckiness intoxicat                                                                           | gagged politeness looser       |            | 
|  13091 | taxi endowment watchfulness battalions stay trickle tangle blowfish maid's transmissions questionnaire vomit saner strokers constituent crab's      | populations window's zoo armie |            | 
|  16723 | taxi storing couldest bouts allegoric cluttered steeples fives hitchhike thrashes retirement de                                                     | envisaged maintain             |            | 
|  22168 | taxicab voicer controllers removing cellular houses router nourishing edict shrines strikes testicle's destine whale russeted certi                 | masturbates pu                 |            | 
|  24244 | taxing sleeve consultant's nonprogrammable twine delayer ingot respecter subex                                                                      | prematurely significant        |            | 
|  35595 | taxicabs aback spinal checkers germs overdraft's coon critter's patrician fled coalition massaging paced condemning impen                           | oppre                          |            | 
|  37105 | taxis articulatory indulgence bystanders skin burgess starlight calendaring aunt's bilging benightedness smallest softened xiv immerser fresher unn | crudely papally r              |            | 
|  40871 | taxicabs muzzling precocious resentment fellers pitiers beasts marines baselines diagrammatically clowning connecters stampedin                     | influencer                     |            | 
|  44298 | taxi rages unintelligibleness anastomosis orthogonally incompatibilities keypads hoarse province stamping perceived sh                              | unforgiving quiet              |            | 
|  46395 | taxi plunders novelty's downstairs newborn symbiotic climax highlights lounger keypads only schools possibilities                                   | flowing forgeries slende       |            | 
|  66078 | taxicab mercilessly excesses ships merchandising patch strobe                                                                                       | armfuls firmament hum coop     |            | 
|  71095 | taxing dispense regrettably resuspended kilobits downwardly domestically laps rainiest recapitulates despiser trophies chums a                      | enumerate indoctrinat          |            | 
|  77668 | taxicabs approachable disqualifying charcoaled script's o                                                                                           | kited publish disburse anarch  |            | 
|  84162 | taxicab's captivity dean eyeball uninspiring pawn's complication outcast's stared sneak s                                                           | impracticable dungeon crop     |            | 
|  87930 | taxi swiftly repacks unsupported slice mornings squares gland solar brainier harrying wag                                                           | cowslip halter plastics        |            | 
|  91282 | taxiing undetected cast commands clasping germina                                                                                                   | waxes her                      |            | 
| 104029 | taxied roofs besetting leadership electrocuted input metaphor bubbler vowing sponges assess                                                         | worthing understated bark      |            | 
| 105818 | taxis libretti defensively shoes antagonistically heavier endeared accidental gauging intercourse revolte                                           | runne                          |            | 
| 106163 | taxis bonfire's bench stereo preventer boringness blot's quieter acronyms transplant gained implores ba                                             | sighting leased sp             |            | 
| 122471 | taxicab's they've berries invader touching bumblingly courtier's boosting undisguised destroy amanuensis bangles digestiveness poppy's hulls        | purity professional unski      |            | 
| 123931 | taxicab's headgear Popek ratifying tenured Pascal's subduedly quitting earned planter forgave implicated noo                                        | bibliographies fraill          |            | 
| 127383 | taxi choir parameter's busted inspiration's fixated blinking complicator outwit plotters gobbles burningly leafed corruptively                      | radioed size telegr            |            | 
| 134211 | taxi's reconstructible indirect agglutination awaken eked unoccupied pillager subcomputation interviewing treader commending i                      | muddiness broom's              |            | 
| 145784 | taxicabs sanctuary armful battening terrifying impactors guns exchequer reigns laughter desolater s                                                 | buttonhole's isomorphism       |            | 
| 146371 | taxied carnivals giver misconceptions countenancer introduced anchovy exile pipelines weaned unabridged Britishly abyss's extenuating moodiness th  | penetrator upsho               |            | 
+--------+-----------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------+------------+
29 rows in set (0.11 sec)

mysql> explain select * from people where title like 'taxi%';
+----+-------------+--------+------+---------------+------+---------+------+--------+-------------+
| id | select_type | table  | type | possible_keys | key  | key_len | ref  | rows   | Extra       |
+----+-------------+--------+------+---------------+------+---------+------+--------+-------------+
|  1 | SIMPLE      | people | ALL  | NULL          | NULL | NULL    | NULL | 150000 | Using where | 
+----+-------------+--------+------+---------------+------+---------+------+--------+-------------+

mysql> select * from people where title = 'hoppers iciest sharer dietitian dictionaries frac';
+----+---------------------------------------------------+--------------------+------------+
| id | title                                             | city               | occupation |
+----+---------------------------------------------------+--------------------+------------+
| 34 | hoppers iciest sharer dietitian dictionaries frac | coroneted revolve  |            | 
+----+---------------------------------------------------+--------------------+------------+
1 row in set (0.12 sec)

mysql> explain select * from people where title = 'hoppers iciest sharer dietitian dictionaries frac';
+----+-------------+--------+------+---------------+------+---------+------+--------+-------------+
| id | select_type | table  | type | possible_keys | key  | key_len | ref  | rows   | Extra       |
+----+-------------+--------+------+---------------+------+---------+------+--------+-------------+
|  1 | SIMPLE      | people | ALL  | NULL          | NULL | NULL    | NULL | 150000 | Using where | 
+----+-------------+--------+------+---------------+------+---------+------+--------+-------------+
1 row in set (0.00 sec)
</pre>
<p>
Ok, now let&#8217;s create the index. I&#8217;m loading it into a cache here, which is not necessary (even less given my size of 150000 tuples), but it helps. In order to do this, all the indexes in your table must have the same block size.
</p>
<pre>
mysql> create index idx_people_title on people(title(15));
Query OK, 150000 rows affected (1.60 sec)
Records: 150000  Duplicates: 0  Warnings: 0

mysql> reset query cache;
Query OK, 0 rows affected (0.00 sec)

mysql> load index into cache people;
+-------------+--------------+----------+----------+
| Table       | Op           | Msg_type | Msg_text |
+-------------+--------------+----------+----------+
| test.people | preload_keys | status   | OK       | 
+-------------+--------------+----------+----------+
1 row in set (0.00 sec)

</pre>
<p>
Let&#8217;s re test the queries:
</p>
<pre>

mysql> select * from people where title like 'taxi%';
+--------+-----------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------+------------+
| id     | title                                                                                                                                               | city                           | occupation |
+--------+-----------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------+------------+
|   8388 | taxi catsup ornament transformer widener syndicates dismount pop t                                                                                  | exhibition's manages hedgehog' |            | 
| 127383 | taxi choir parameter's busted inspiration's fixated blinking complicator outwit plotters gobbles burningly leafed corruptively                      | radioed size telegr            |            | 
|  13091 | taxi endowment watchfulness battalions stay trickle tangle blowfish maid's transmissions questionnaire vomit saner strokers constituent crab's      | populations window's zoo armie |            | 
|  46395 | taxi plunders novelty's downstairs newborn symbiotic climax highlights lounger keypads only schools possibilities                                   | flowing forgeries slende       |            | 
|  44298 | taxi rages unintelligibleness anastomosis orthogonally incompatibilities keypads hoarse province stamping perceived sh                              | unforgiving quiet              |            | 
|  16723 | taxi storing couldest bouts allegoric cluttered steeples fives hitchhike thrashes retirement de                                                     | envisaged maintain             |            | 
|  87930 | taxi swiftly repacks unsupported slice mornings squares gland solar brainier harrying wag                                                           | cowslip halter plastics        |            | 
|   5736 | taxi's allegorically accounting manipulatory cautiousness computational promoter wool reproo                                                        | islan                          |            | 
| 134211 | taxi's reconstructible indirect agglutination awaken eked unoccupied pillager subcomputation interviewing treader commending i                      | muddiness broom's              |            | 
|     40 | taxi's wearying espies Anglican's intangibly fluent jugs liveth pride ex                                                                            | rotation tri                   |            | 
|  66078 | taxicab mercilessly excesses ships merchandising patch strobe                                                                                       | armfuls firmament hum coop     |            | 
|  22168 | taxicab voicer controllers removing cellular houses router nourishing edict shrines strikes testicle's destine whale russeted certi                 | masturbates pu                 |            | 
|  84162 | taxicab's captivity dean eyeball uninspiring pawn's complication outcast's stared sneak s                                                           | impracticable dungeon crop     |            | 
| 123931 | taxicab's headgear Popek ratifying tenured Pascal's subduedly quitting earned planter forgave implicated noo                                        | bibliographies fraill          |            | 
| 122471 | taxicab's they've berries invader touching bumblingly courtier's boosting undisguised destroy amanuensis bangles digestiveness poppy's hulls        | purity professional unski      |            | 
|  35595 | taxicabs aback spinal checkers germs overdraft's coon critter's patrician fled coalition massaging paced condemning impen                           | oppre                          |            | 
|  77668 | taxicabs approachable disqualifying charcoaled script's o                                                                                           | kited publish disburse anarch  |            | 
|  40871 | taxicabs muzzling precocious resentment fellers pitiers beasts marines baselines diagrammatically clowning connecters stampedin                     | influencer                     |            | 
| 145784 | taxicabs sanctuary armful battening terrifying impactors guns exchequer reigns laughter desolater s                                                 | buttonhole's isomorphism       |            | 
| 146371 | taxied carnivals giver misconceptions countenancer introduced anchovy exile pipelines weaned unabridged Britishly abyss's extenuating moodiness th  | penetrator upsho               |            | 
| 104029 | taxied roofs besetting leadership electrocuted input metaphor bubbler vowing sponges assess                                                         | worthing understated bark      |            | 
|  91282 | taxiing undetected cast commands clasping germina                                                                                                   | waxes her                      |            | 
|  71095 | taxing dispense regrettably resuspended kilobits downwardly domestically laps rainiest recapitulates despiser trophies chums a                      | enumerate indoctrinat          |            | 
|  24244 | taxing sleeve consultant's nonprogrammable twine delayer ingot respecter subex                                                                      | prematurely significant        |            | 
|  10418 | taxingly eligibility whichever meditation corrosion unluckiness intoxicat                                                                           | gagged politeness looser       |            | 
|  37105 | taxis articulatory indulgence bystanders skin burgess starlight calendaring aunt's bilging benightedness smallest softened xiv immerser fresher unn | crudely papally r              |            | 
| 106163 | taxis bonfire's bench stereo preventer boringness blot's quieter acronyms transplant gained implores ba                                             | sighting leased sp             |            | 
| 105818 | taxis libretti defensively shoes antagonistically heavier endeared accidental gauging intercourse revolte                                           | runne                          |            | 
|   6967 | taxis sprawls unblushing rude put absorbs reproducibilities crumblier kid                                                                           | DeMorgan overhe                |            | 
+--------+-----------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------+------------+
29 rows in set (0.00 sec)

mysql> explain select * from people where title like 'taxi%';
+----+-------------+--------+-------+------------------+------------------+---------+------+------+-------------+
| id | select_type | table  | type  | possible_keys    | key              | key_len | ref  | rows | Extra       |
+----+-------------+--------+-------+------------------+------------------+---------+------+------+-------------+
|  1 | SIMPLE      | people | range | idx_people_title | idx_people_title | 17      | NULL |   56 | Using where | 
+----+-------------+--------+-------+------------------+------------------+---------+------+------+-------------+
1 row in set (0.00 sec)
</pre>
<p>
Notice how we only go through 56 rows now.
</p>
<pre>

mysql> select * from people where title = 'hoppers iciest sharer dietitian dictionaries frac';
+----+---------------------------------------------------+--------------------+------------+
| id | title                                             | city               | occupation |
+----+---------------------------------------------------+--------------------+------------+
| 34 | hoppers iciest sharer dietitian dictionaries frac | coroneted revolve  |            | 
+----+---------------------------------------------------+--------------------+------------+
1 row in set (0.00 sec)

mysql> explain select * from people where title = 'hoppers iciest sharer dietitian dictionaries frac';
+----+-------------+--------+------+------------------+------------------+---------+-------+------+-------------+
| id | select_type | table  | type | possible_keys    | key              | key_len | ref   | rows | Extra       |
+----+-------------+--------+------+------------------+------------------+---------+-------+------+-------------+
|  1 | SIMPLE      | people | ref  | idx_people_title | idx_people_title | 17      | const |    1 | Using where | 
+----+-------------+--------+------+------------------+------------------+---------+-------+------+-------------+
1 row in set (0.01 sec)
</pre>
<p>
Just 1 row.<br />
Ok, let&#8217;s test the quality of the index to find unique rows.
</p>
<pre>

mysql> explain select * from people where title = 'arbitrariness MacDraw\'s carbonates suckers budget chronicler cur drabs untested Aryans imperial';
+----+-------------+--------+------+------------------+------------------+---------+-------+------+-------------+
| id | select_type | table  | type | possible_keys    | key              | key_len | ref   | rows | Extra       |
+----+-------------+--------+------+------------------+------------------+---------+-------+------+-------------+
|  1 | SIMPLE      | people | ref  | idx_people_title | idx_people_title | 17      | const |    1 | Using where | 
+----+-------------+--------+------+------------------+------------------+---------+-------+------+-------------+
1 row in set (0.00 sec)

mysql> explain select * from people where title = 'dunes delightfulness manurers jousts axer aristocrat\'s driver greediness bloke pays preconditions enclosure consideration plaster';
+----+-------------+--------+------+------------------+------------------+---------+-------+------+-------------+
| id | select_type | table  | type | possible_keys    | key              | key_len | ref   | rows | Extra       |
+----+-------------+--------+------+------------------+------------------+---------+-------+------+-------------+
|  1 | SIMPLE      | people | ref  | idx_people_title | idx_people_title | 17      | const |    1 | Using where | 
+----+-------------+--------+------+------------------+------------------+---------+-------+------+-------------+
1 row in set (0.00 sec)

mysql> explain select * from people where title = 'satire\'s most quacked campaigning wrists disengaging insignia woodlander knuckles despaired portending incredulous predication Sally\'s amica';
+----+-------------+--------+------+------------------+------------------+---------+-------+------+-------------+
| id | select_type | table  | type | possible_keys    | key              | key_len | ref   | rows | Extra       |
+----+-------------+--------+------+------------------+------------------+---------+-------+------+-------------+
|  1 | SIMPLE      | people | ref  | idx_people_title | idx_people_title | 17      | const |    2 | Using where | 
+----+-------------+--------+------+------------------+------------------+---------+-------+------+-------------+
1 row in set (0.00 sec)

mysql> select *,length(title) from people where title = 'satire\'s most quacked campaigning wrists disengaging insignia woodlander knuckles despaired portending incredulous predication Sally\'s amica';
+-----+----------------------------------------------------------------------------------------------------------------------------------------------+------------------+------------+---------------+
| id  | title                                                                                                                                        | city             | occupation | length(title) |
+-----+----------------------------------------------------------------------------------------------------------------------------------------------+------------------+------------+---------------+
| 344 | satire's most quacked campaigning wrists disengaging insignia woodlander knuckles despaired portending incredulous predication Sally's amica | buffaloes refill |            |           140 | 
+-----+----------------------------------------------------------------------------------------------------------------------------------------------+------------------+------------+---------------+
1 row in set (0.00 sec)

</pre>
<p>
Granted, 150.000 rows isn&#8217;t much, but still, with an average row data length of 94, I had to find a 140 character title in order to go through 2 rows before the right one was found. That&#8217;s reasonable, considering<br />
I estimated a 97% index coverage.
</p>
<p>
In conclusion, wihle my dataset size certainly isn&#8217;t large enough to do many interesting things, it should prove the point that a good index size will go great lengths into helping you improve the performance of your MySQL based<br />
system. The query I presented here can be useful to look for a decent index size in terms of unique rows coverage.</p>
<div class='yarpp-related-rss'>
<p>Related posts:<ol>
<li><a href='http://fernandoipar.com/2009/04/06/using-mysql-proxy-to-benchmark-query-performance/' rel='bookmark' title='Using MySQL Proxy to benchmark query performance'>Using MySQL Proxy to benchmark query performance</a> <small>By transparently sitting between client and server on each request,...</small></li>
<li><a href='http://fernandoipar.com/2009/04/18/making-use-of-procedure-analyse/' rel='bookmark' title='Making use of procedure analyse()'>Making use of procedure analyse()</a> <small>SELECT Field0[,Field1,Field2,...] FROM TABLE PROCEDURE ANALYSE() is a nice tool...</small></li>
<li><a href='http://fernandoipar.com/2009/03/09/using-the-enum-data-type-to-increase-performance/' rel='bookmark' title='Using the ENUM data type to increase performance'>Using the ENUM data type to increase performance</a> <small>While going through the DATA TYPES section of the Certification...</small></li>
</ol></p>
<p>Related posts brought to you by <a href='http://yarpp.org'>Yet Another Related Posts Plugin</a>.</p>
</div>
]]></content:encoded>
			<wfw:commentRss>http://fernandoipar.com/2009/08/12/indexing-text-columns-in-mysql/feed/</wfw:commentRss>
		<slash:comments>6</slash:comments>
		</item>
		<item>
		<title>iptables trick to limit concurrent tcp connections</title>
		<link>http://fernandoipar.com/2009/06/17/iptables-trick-to-limit-concurrent-tcp-connections/</link>
		<comments>http://fernandoipar.com/2009/06/17/iptables-trick-to-limit-concurrent-tcp-connections/#comments</comments>
		<pubDate>Wed, 17 Jun 2009 11:58:10 +0000</pubDate>
		<dc:creator>fernando</dc:creator>
				<category><![CDATA[MySQL]]></category>
		<category><![CDATA[bugs]]></category>
		<category><![CDATA[concurrency]]></category>
		<category><![CDATA[security]]></category>

		<guid isPermaLink="false">http://fernandoipar.com/?p=173</guid>
		<description><![CDATA[This is sort of a self-documenting post, and a self-support group about ill-behaved tomcat apps. Sometimes, you have multiple nodes accesing your MySQL server (or any kind of server, for that matter) concurrently. Eventually, software in one or more of these nodes might do nasty things (you know who you are buddy:)) MySQL provides a [...]<div class='yarpp-related-rss'>

Related posts:<ol>
<li><a href='http://fernandoipar.com/2009/04/06/using-mysql-proxy-to-benchmark-query-performance/' rel='bookmark' title='Using MySQL Proxy to benchmark query performance'>Using MySQL Proxy to benchmark query performance</a> <small>By transparently sitting between client and server on each request,...</small></li>
</ol>

Related posts brought to you by <a href='http://yarpp.org'>Yet Another Related Posts Plugin</a>.
</div>
]]></description>
				<content:encoded><![CDATA[<p>This is sort of a self-documenting post, and a self-support group about ill-behaved tomcat apps.</p>
<p>Sometimes, you have multiple nodes accesing your MySQL server (or any kind of server, for that matter) concurrently. Eventually, software in one or more of these nodes might do nasty things (you know who you are buddy:))</p>
<p>MySQL provides a built in mechanism to limit concurrent connections, but this can only be set for the whole server, or on a per user basis. Unfortunatly, most of these setups use the same database user for all their nodes, so this feature can&#8217;t be used to confine any possible damage.</p>
<p>Enter your good friend iptables.</p>
<p>This isn&#8217;t perfect, but this little trick might help you while programmers take care of their business:</p>
<pre>iptables -A INPUT -p tcp -m recent --rcheck --seconds 60 -j REJECT
iptables -A INPUT -p tcp --dport 3306 -m connlimit --connlimit-above 2 -m recent --set -j REJECT</pre>
<p>(The number of seconds and the concurrency limit here are examples for testing only, set them to proper values if you use them in your servers!)</p>
<p>This two rules create a recent &#8216;bad guy&#8217; list, and send any source that exceeds two concurent connections on tcp pot 3306 to this list for 60 seconds.</p>
<p>If used smartly with a proper timeout value for MySQL connections, this could be useful for situations such as the one I described.</p>
<p>Hope it helps you!</p>
<div class='yarpp-related-rss'>
<p>Related posts:<ol>
<li><a href='http://fernandoipar.com/2009/04/06/using-mysql-proxy-to-benchmark-query-performance/' rel='bookmark' title='Using MySQL Proxy to benchmark query performance'>Using MySQL Proxy to benchmark query performance</a> <small>By transparently sitting between client and server on each request,...</small></li>
</ol></p>
<p>Related posts brought to you by <a href='http://yarpp.org'>Yet Another Related Posts Plugin</a>.</p>
</div>
]]></content:encoded>
			<wfw:commentRss>http://fernandoipar.com/2009/06/17/iptables-trick-to-limit-concurrent-tcp-connections/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>SOUNDEX(), triggers, and stored procedures</title>
		<link>http://fernandoipar.com/2009/04/29/soundex-triggers-and-stored-procedures/</link>
		<comments>http://fernandoipar.com/2009/04/29/soundex-triggers-and-stored-procedures/#comments</comments>
		<pubDate>Wed, 29 Apr 2009 23:14:51 +0000</pubDate>
		<dc:creator>fernando</dc:creator>
				<category><![CDATA[MySQL]]></category>
		<category><![CDATA[Programming]]></category>
		<category><![CDATA[SQL]]></category>

		<guid isPermaLink="false">http://fernandoipar.com/?p=166</guid>
		<description><![CDATA[MySQL provides a SOUNDEX() function, which returns the soundex of a given string. For details, refer to the manual, but to put it simply, it allows you to compare strings based on how they sound, hence letting you do proximity searches on your database. If you&#8217;re just querying for a word, it&#8217;s usage is pretty [...]<div class='yarpp-related-rss'>

Related posts:<ol>
<li><a href='http://fernandoipar.com/2009/04/21/extending-procedure_analyse/' rel='bookmark' title='Extending procedure_analyse'>Extending procedure_analyse</a> <small>My previous post explored a stored procedure that extended procedure_analyse...</small></li>
<li><a href='http://fernandoipar.com/2009/04/18/making-use-of-procedure-analyse/' rel='bookmark' title='Making use of procedure analyse()'>Making use of procedure analyse()</a> <small>SELECT Field0[,Field1,Field2,...] FROM TABLE PROCEDURE ANALYSE() is a nice tool...</small></li>
</ol>

Related posts brought to you by <a href='http://yarpp.org'>Yet Another Related Posts Plugin</a>.
</div>
]]></description>
				<content:encoded><![CDATA[<p>MySQL provides a SOUNDEX() function, which returns the soundex of a given string. For details, refer to the <a href="http://dev.mysql.com/doc/refman/5.0/en/string-functions.html#function_soundex">manual</a>, but to put it simply, it allows you to compare strings based on how they sound, hence letting you do proximity searches on your database.</p>
<p>If you&#8217;re just querying for a word, it&#8217;s usage is pretty straightforward, and in fact, you can use the SOUNDS LIKE operator to build expressions such as this:</p>
<blockquote><p>SELECT <em>expr</em> FROM <em>table expr</em> WHERE <em>field0 </em>SOUNDS LIKE <em>&#8216;inputWord&#8217;</em></p></blockquote>
<p>However, if you&#8217;re storing multiple-word strings, things get a little more complicated, since they can&#8217;t be compared by their soundex. Rather, the soundex returned will be associated with the whole phrase. If at a later time, you want to search for a subpart of this phrase, there&#8217;s no way for you to do this.</p>
<p>Well, at least not directly, but by using an auxiliary table to store soundex strings, a couple of stored procedures, and a trigger, it can be done with little effort to programmers that use the database.</p>
<p>Let&#8217;s assume we have a very simple table called <em>soundex_text</em> with the following structure:</p>
<pre>create table if not exists soundex_text (
	id int unsigned not null auto_increment primary key,
	description text
) Engine = Innodb;</pre>
<p>The field we want to query by proximity is <em>description</em>, therefore we create the following auxiliary table to store soundex values:</p>
<pre>create table if not exists soundex_text_index (
	soundex_text_id int unsigned not null references soundex_text(id),
	soundex char(4)
) Engine = Innodb;</pre>
<p>We now create a stored procedure and a trigger to populate the auxiliary table automatically every time a row is inserted in the main table.</p>
<pre>-- adapted from example at http://forums.mysql.com/read.php?60,78776,242420#msg-242420
-- posted by jim smith: http://forums.mysql.com/profile.php?60,3154903
CREATE PROCEDURE update_soundex_text_index (sStringIn text,splitChar varchar(1), soundex_text_id int)
BEGIN
DECLARE comma INT DEFAULT 0;
DECLARE mylist TEXT DEFAULT sStringIn;
DECLARE temp TEXT DEFAULT '';
DECLARE strlen int DEFAULT LENGTH(sStringIn);
DECLARE insert_id int DEFAULT soundex_text_id;

/* find the first instance of the spliting character */
SET comma = LOCATE(splitChar,mylist);
/* Insert each split variable into the temp table */
WHILE strlen &gt; 0 DO
	IF comma = 0 THEN
		SET temp = TRIM(mylist);
		SET mylist = '';
		SET strlen = 0;
	END IF;
	IF comma != 0 THEN
		SET temp = TRIM(SUBSTRING(mylist,1,comma-1));
		SET mylist = TRIM(SUBSTRING(mylist FROM comma+1));
		SET strlen = LENGTH(mylist);
		-- Sample handling of special chars you might want removed from individual words
		-- before storing their soundex.
		SELECT REPLACE(temp,',','') INTO temp;
		SELECT REPLACE(temp,';','') INTO temp;
		SELECT REPLACE(temp,':','') INTO temp;
	END IF;
	IF temp != '' THEN
		insert into soundex_text_index (soundex_text_id,soundex) values (insert_id,substring(soundex(temp) from 1 for 4));
	END IF;
	SET comma = LOCATE(splitChar,mylist);
END WHILE;

END//

drop trigger if exists soundex_text_bi//
create trigger soundex_text_bi
before insert
on soundex_text
for each row
begin
	SET @id = last_insert_id();
	call update_soundex_text_index (NEW.description, ' ', @id);
end;//
delimiter ;</pre>
<p>Finally, a stored procedure to automatically query the table using the auxiliary table implicitly:</p>
<pre>delimiter //
create procedure query_soundex_text(sStringIn text, splitChar varchar(1))
BEGIN
DECLARE comma INT DEFAULT 0;
DECLARE mylist TEXT DEFAULT sStringIn;
DECLARE temp TEXT DEFAULT '';
DECLARE strlen int DEFAULT LENGTH(sStringIn); 

create temporary table results (id int unsigned, description text);

/* find the first instance of the spliting character */
SET comma = LOCATE(splitChar,mylist);
/* Insert each split variable into the temp table */
WHILE strlen &gt; 0 DO
	IF comma = 0 THEN
		SET temp = TRIM(mylist);
		SET mylist = '';
		SET strlen = 0;
	END IF;
	IF comma != 0 THEN
		SET temp = TRIM(SUBSTRING(mylist,1,comma-1));
		SET mylist = TRIM(SUBSTRING(mylist FROM comma+1));
		SET strlen = LENGTH(mylist);
		-- Sample handling of special chars you might want removed from individual words
		-- before storing their soundex.
		SELECT REPLACE(temp,',','') INTO temp;
		SELECT REPLACE(temp,';','') INTO temp;
		SELECT REPLACE(temp,':','') INTO temp;
	END IF;
	IF temp != '' THEN
		insert into results select st.id, st.description from soundex_text st, soundex_text_index sti where sti.soundex = substring(soundex(temp) from 1 for 4);
	END IF;
	SET comma = LOCATE(splitChar,mylist);
END WHILE;

select distinct * from results;
drop table results;

END//
delimiter ;</pre>
<p>Using this is pretty straightforward.<br />
Just insert some sample data into the table, and give it a shot!:</p>
<pre>mysql&gt; insert into soundex_text(description) values ('This is a sample text row');
Query OK, 1 row affected (0.00 sec)

mysql&gt; call query_soundex_text('semple',' ');
+------+---------------------------+
| id   | description               |
+------+---------------------------+
|    3 | This is a sample text row |
+------+---------------------------+
1 row in set (0.00 sec)

Query OK, 0 rows affected (0.00 sec)

mysql&gt; call query_soundex_text('THis is a samPLE taxt row',' ');
+------+---------------------------+
| id   | description               |
+------+---------------------------+
|    3 | This is a sample text row |
+------+---------------------------+
1 row in set (0.00 sec)

Query OK, 0 rows affected (0.00 sec)

mysql&gt; call query_soundex_text('rock',' ');
Empty set (0.01 sec)

Query OK, 0 rows affected (0.01 sec)</pre>
<p>To make it even easier, you could modify the querying stored procedure and always assume that the splitting character is a whitespace.</p>
<p>You can download a zipfile with a sql script with all this code for you to load into a MySQL database right <a title="Soundex search scripts" href="http://fernandoipar.com/soundex.sql.zip">here</a>.</p>
<p>Enjoy!</p>
<div class='yarpp-related-rss'>
<p>Related posts:<ol>
<li><a href='http://fernandoipar.com/2009/04/21/extending-procedure_analyse/' rel='bookmark' title='Extending procedure_analyse'>Extending procedure_analyse</a> <small>My previous post explored a stored procedure that extended procedure_analyse...</small></li>
<li><a href='http://fernandoipar.com/2009/04/18/making-use-of-procedure-analyse/' rel='bookmark' title='Making use of procedure analyse()'>Making use of procedure analyse()</a> <small>SELECT Field0[,Field1,Field2,...] FROM TABLE PROCEDURE ANALYSE() is a nice tool...</small></li>
</ol></p>
<p>Related posts brought to you by <a href='http://yarpp.org'>Yet Another Related Posts Plugin</a>.</p>
</div>
]]></content:encoded>
			<wfw:commentRss>http://fernandoipar.com/2009/04/29/soundex-triggers-and-stored-procedures/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Extending procedure_analyse</title>
		<link>http://fernandoipar.com/2009/04/21/extending-procedure_analyse/</link>
		<comments>http://fernandoipar.com/2009/04/21/extending-procedure_analyse/#comments</comments>
		<pubDate>Tue, 21 Apr 2009 16:06:26 +0000</pubDate>
		<dc:creator>fernando</dc:creator>
				<category><![CDATA[MySQL]]></category>
		<category><![CDATA[Programming]]></category>

		<guid isPermaLink="false">http://fernandoipar.com/?p=163</guid>
		<description><![CDATA[My previous post explored a stored procedure that extended procedure_analyse with the intent of helping DBAs optimize table structure. Here&#8217;s an improved version. I&#8217;ve followed Arjen Lentz&#8216;s suggestion and added support for the max_elements and max_memory parameters. I also added a new Indexed column to the output, which is an ENUM(&#8216;No&#8217;,'Yes&#8217;,'Overindexed&#8217;). Yes and No are [...]<div class='yarpp-related-rss'>

Related posts:<ol>
<li><a href='http://fernandoipar.com/2009/04/18/making-use-of-procedure-analyse/' rel='bookmark' title='Making use of procedure analyse()'>Making use of procedure analyse()</a> <small>SELECT Field0[,Field1,Field2,...] FROM TABLE PROCEDURE ANALYSE() is a nice tool...</small></li>
</ol>

Related posts brought to you by <a href='http://yarpp.org'>Yet Another Related Posts Plugin</a>.
</div>
]]></description>
				<content:encoded><![CDATA[<p>My previous post explored a stored procedure that extended procedure_analyse with the intent of helping DBAs optimize table structure.</p>
<p>Here&#8217;s an improved version. I&#8217;ve followed <a title="Open Query" href="http://openquery.com/">Arjen Lentz</a>&#8216;s suggestion and added support for the max_elements and max_memory parameters.</p>
<p>I also added a new Indexed column to the output, which is an ENUM(&#8216;No&#8217;,'Yes&#8217;,'Overindexed&#8217;). Yes and No are self-explanatory, while Overindexed means the column is present as the left-most part of more than one index. This is useless, it just presents a performance penalty for MySQL (it needs to update more indexes) and if, for instance, you have columns A and B, and you have KEY(A) and KEY (A,B), mysql can use the second index to search for A alone too.</p>
<p>Here&#8217;s the updated version:</p>
<pre>
/*

extended procedure analyse
(C) 2009 Fernando Ipar
mail(at)fernandoipar.com

GPLv2

*/
drop procedure if exists extended_procedure_analyse;
delimiter //
create procedure extended_procedure_analyse(databaseName varchar(64), tableName varchar(64), max_elements int, max_memory int)
begin

	drop temporary table if exists procedure_analyse_output;
	drop temporary table if exists tmp_pao;

	create temporary table procedure_analyse_output 
	(
	Field_name varchar(64),
	Min_value int,
	Max_value int,
	Min_length int,
	Max_length int,
	Empties_or_zeros int,
	Nulls int,
	Avg_value_or_avg_length float,
	Std float,
	Optimal_fieldtype text,
	Actual_fieldtype text,
	Indexed enum ('No','Yes','Overindexed') default 'No'
	);	
	
	set @table = concat(databaseName,'.',tableName);
	set @dbName = databaseName;
	set @tbName = tableName;
	set @maxEle = max_elements;
	set @maxMem = max_memory;

	set @qry = concat('insert into procedure_analyse_output (Field_name,Min_value,Max_value,Min_length,Max_length,Empties_or_zeros,Nulls,Avg_Value_or_avg_length,Std,Optimal_fieldtype) select * from ', @table,' procedure analyse(',@maxEle,',',@maxMem,')');
	prepare myStmt from @qry;
	execute myStmt;
	
	update procedure_analyse_output set Field_name = replace(Field_name,  CONCAT(databaseName,'.',tableName,'.'),'');
	
	prepare myStmt from 'update procedure_analyse_output pao, information_schema.columns c set pao.Actual_Fieldtype = c.column_type where table_schema = ? and table_name = ? and column_name = pao.Field_name';
	execute myStmt using @dbName,@tbName;
	
	set @qry = concat('select count(*) as `Total_number_of_rows` from ',@table);
	prepare myStmt from @qry;
	
	execute myStmt;
	
	set @qry = concat('update procedure_analyse_output pao set Indexed = "Yes\" where exists (select 1 from information_schema.statistics where table_schema = ? and table_name = ? and column_name = pao.Field_name)');
	prepare myStmt from @qry;
	execute myStmt using @dbName,@tbName;
	

	create temporary table tmp_pao as select * from procedure_analyse_output;
	prepare myStmt from 'update tmp_pao set Indexed = "Overindexed" where exists (select Field_name,count(*) from procedure_analyse_output pao inner join information_schema.statistics s on pao.Field_name = s.column_name where table_schema = ? and table_name = ? and seq_in_index = 1 and pao.Field_name = tmp_pao.Field_name group by Field_name having count(*) > 1)';
	execute myStmt  using @dbName,@tbName;
		
	select * from tmp_pao;
	
	drop temporary table procedure_analyse_output;
	drop temporary table tmp_pao; 
	
end;
//
delimiter ;

</pre>
<p>And here&#8217;s a sample output: </p>
<pre>
mysql> call extended_procedure_analyse('test','Account',4,100)\G
*************************** 1. row ***************************
Total_number_of_rows: 1
1 row in set (0.04 sec)

*************************** 1. row ***************************
             Field_name: InternalAID
              Min_value: 2147483647
              Max_value: 2147483647
             Min_length: 18
             Max_length: 18
       Empties_or_zeros: 0
                  Nulls: 0
Avg_value_or_avg_length: 1.65632e+17
                    Std: 0
      Optimal_fieldtype: BIGINT(18) UNSIGNED NOT NULL
       Actual_fieldtype: bigint(20)
                Indexed: Yes
*************************** 2. row ***************************
             Field_name: accountID
              Min_value: 12
              Max_value: 12
             Min_length: 2
             Max_length: 2
       Empties_or_zeros: 0
                  Nulls: 0
Avg_value_or_avg_length: 2
                    Std: NULL
      Optimal_fieldtype: ENUM('12') NOT NULL
       Actual_fieldtype: varchar(255)
                Indexed: Overindexed
*************************** 3. row ***************************
             Field_name: acctBalance
              Min_value: 2147483647
              Max_value: 2147483647
             Min_length: 11
             Max_length: 11
       Empties_or_zeros: 0
                  Nulls: 0
Avg_value_or_avg_length: 1.00051e+20
                    Std: 0
      Optimal_fieldtype: BIGINT(11) UNSIGNED NOT NULL
       Actual_fieldtype: double
                Indexed: Yes
*************************** 4. row ***************************
             Field_name: ownerID
              Min_value: 2147483647
              Max_value: 2147483647
             Min_length: 19
             Max_length: 19
       Empties_or_zeros: 0
                  Nulls: 0
Avg_value_or_avg_length: 6.57541e+18
                    Std: 0
      Optimal_fieldtype: BIGINT(19) UNSIGNED NOT NULL
       Actual_fieldtype: bigint(20)
                Indexed: No
4 rows in set (0.24 sec)

Query OK, 0 rows affected, 8 warnings (0.24 sec)

</pre>
<p>There&#8217;s a <a title="mydbsuggest at bitbucket" href="http://www.bitbucket.org/nandix/mydbsuggest/">hg repository</a> to handle the project. We&#8217;re working with <a title="fedesilva@bitbucket.org" href="http://www.bitbucket.org/fedesilva">@fedesilva</a> to create a standalone java app that will present this and more info in a friendly manner to sysadmins, and to handle the creation and destruction of the stored procedure automatically.</p>
<div class='yarpp-related-rss'>
<p>Related posts:<ol>
<li><a href='http://fernandoipar.com/2009/04/18/making-use-of-procedure-analyse/' rel='bookmark' title='Making use of procedure analyse()'>Making use of procedure analyse()</a> <small>SELECT Field0[,Field1,Field2,...] FROM TABLE PROCEDURE ANALYSE() is a nice tool...</small></li>
</ol></p>
<p>Related posts brought to you by <a href='http://yarpp.org'>Yet Another Related Posts Plugin</a>.</p>
</div>
]]></content:encoded>
			<wfw:commentRss>http://fernandoipar.com/2009/04/21/extending-procedure_analyse/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
		<item>
		<title>Making use of procedure analyse()</title>
		<link>http://fernandoipar.com/2009/04/18/making-use-of-procedure-analyse/</link>
		<comments>http://fernandoipar.com/2009/04/18/making-use-of-procedure-analyse/#comments</comments>
		<pubDate>Sat, 18 Apr 2009 19:47:48 +0000</pubDate>
		<dc:creator>fernando</dc:creator>
				<category><![CDATA[MySQL]]></category>
		<category><![CDATA[datatypes]]></category>
		<category><![CDATA[optimization]]></category>

		<guid isPermaLink="false">http://fernandoipar.com/?p=155</guid>
		<description><![CDATA[SELECT Field0[,Field1,Field2,...] FROM TABLE PROCEDURE ANALYSE() is a nice tool to find out more about your table&#8217;s columns. Still, it could be improved in a lot of ways, and the stored procedure below is a starting point. It makes use of procedure analyse (though with &#8216;SELECT * FROM&#8217;), and modifies it&#8217;s output to include the [...]<div class='yarpp-related-rss yarpp-related-none'>

No related posts.
</div>
]]></description>
				<content:encoded><![CDATA[<p>SELECT Field0[,Field1,Field2,...] FROM TABLE PROCEDURE ANALYSE() is a nice tool to find out more about your table&#8217;s columns.</p>
<p>Still, it could be improved in a lot of ways, and the stored procedure below is a starting point. It makes use of <em>procedure analyse</em> (though with &#8216;SELECT * FROM&#8217;), and modifies it&#8217;s output to include the actual column datatype and the total number of rows of the table.</p>
<p>The actual datatype is a piece of information I&#8217;ve seen a lot of people request, and the number of rows is, I think, a critical piece of information to determine if the output of <em>procedure analyse</em> is credible or not. It&#8217;s not the same thing to take suggestions from mysql on a table with 7 or 20 rows than from a table with 1000000 rows. Of course, remember than numbers alone mean nothing, you might just have 7 rows in a table that represent the entire universe of possible values for your problem domain. In any case, a smart human being with a good toolset is the best way to solve problems!</p>
<p>So here&#8217;s the procedure, which can also be downloaded from this <a title="extended_procedure_analyse.sql" href="http://fernandoipar.com/extended_procedure_analyse.sql">link</a>:</p>
<pre>/*

extended procedure analyse
(C) 2009 Fernando Ipar
mail(at)fernandoipar.com

GPLv2

*/
drop procedure if exists extended_procedure_analyse;
delimiter //
create procedure extended_procedure_analyse(databaseName varchar(64), tableName varchar(64))
begin

	create temporary table procedure_analyse_output
	(
	Field_name varchar(64),
	Min_value int,
	Max_value int,
	Min_length int,
	Max_length int,
	Empties_or_zeros int,
	Nulls int,
	Avg_value_or_avg_length float,
	Std float,
	Optimal_fieldtype text,
	Actual_fieldtype text
	);	

	set @table = concat(databaseName,'.',tableName);
	set @dbName = databaseName;
	set @tbName = tableName;

	set @qry = concat('insert into procedure_analyse_output (Field_name,Min_value,Max_value,Min_length,Max_length,Empties_or_zeros,Nulls,Avg_Value_or_avg_length,Std,Optimal_fieldtype) select * from ', @table,' procedure analyse()');
	prepare myStmt from @qry;
	execute myStmt;

	update procedure_analyse_output set Field_name = replace(Field_name,  CONCAT(databaseName,'.',tableName,'.'),'');

	prepare myStmt from 'update procedure_analyse_output pao, information_schema.columns c set pao.Actual_Fieldtype = c.column_type where table_schema = ? and table_name = ? and column_name = pao.Field_name';
	execute myStmt using @dbName,@tbName;

	set @qry = concat('select count(*) as `Total_number_of_rows` from ',@table);
	prepare myStmt from @qry;

	execute myStmt;

	select * from procedure_analyse_output;

	drop temporary table procedure_analyse_output;

end;
//
delimiter ;</pre>
<p>Here are a couple of sample outputs:</p>
<pre>mysql&gt; call extended_procedure_analyse('test','City')\G
*************************** 1. row ***************************
Total_number_of_rows: 30000
1 row in set (0.11 sec)

*************************** 1. row ***************************
             Field_name: ID
              Min_value: 925001
              Max_value: 955000
             Min_length: 6
             Max_length: 6
       Empties_or_zeros: 0
                  Nulls: 0
Avg_value_or_avg_length: 940000
                    Std: 938839
      Optimal_fieldtype: MEDIUMINT(6) UNSIGNED NOT NULL
       Actual_fieldtype: int(11) unsigned
*************************** 2. row ***************************
             Field_name: CountryCode
              Min_value: 0
              Max_value: 29
             Min_length: 1
             Max_length: 2
       Empties_or_zeros: 1000
                  Nulls: 0
Avg_value_or_avg_length: 14.5
                    Std: 8.6554
      Optimal_fieldtype: ENUM('0','1','2','3','4','5','6','7','8','9','10','11','12','13','14','15','16','17','18','19','20','21','22','23','24','25','26','27','28','29') NOT NULL
       Actual_fieldtype: int(11) unsigned
*************************** 3. row ***************************
             Field_name: Name
              Min_value: 1
              Max_value: 9999
             Min_length: 1
             Max_length: 5
       Empties_or_zeros: 0
                  Nulls: 0
Avg_value_or_avg_length: 4.6605
                    Std: NULL
      Optimal_fieldtype: CHAR(5) NOT NULL
       Actual_fieldtype: varchar(40)
*************************** 4. row ***************************
             Field_name: District
              Min_value: 1
              Max_value: 9999
             Min_length: 1
             Max_length: 5
       Empties_or_zeros: 0
                  Nulls: 0
Avg_value_or_avg_length: 4.6603
                    Std: NULL
      Optimal_fieldtype: CHAR(5) NOT NULL
       Actual_fieldtype: varchar(40)
*************************** 5. row ***************************
             Field_name: Population
              Min_value: 0
              Max_value: 9999
             Min_length: 1
             Max_length: 5
       Empties_or_zeros: 0
                  Nulls: 0
Avg_value_or_avg_length: 4.6647
                    Std: NULL
      Optimal_fieldtype: CHAR(5) NOT NULL
       Actual_fieldtype: varchar(40)
5 rows in set (0.12 sec)

Query OK, 0 rows affected (0.12 sec)

mysql&gt; call extended_procedure_analyse('test','projects_innodb')\G
*************************** 1. row ***************************
Total_number_of_rows: 1007366
1 row in set (14.80 sec)

*************************** 1. row ***************************
             Field_name: id
              Min_value: 1
              Max_value: 1007366
             Min_length: 1
             Max_length: 7
       Empties_or_zeros: 0
                  Nulls: 0
Avg_value_or_avg_length: 503684
                    Std: 581599
      Optimal_fieldtype: MEDIUMINT(7) UNSIGNED NOT NULL
       Actual_fieldtype: int(10) unsigned
*************************** 2. row ***************************
             Field_name: name
              Min_value: 0
              Max_value: 9999
             Min_length: 1
             Max_length: 10
       Empties_or_zeros: 0
                  Nulls: 0
Avg_value_or_avg_length: 4.6958
                    Std: NULL
      Optimal_fieldtype: VARCHAR(10) NOT NULL
       Actual_fieldtype: char(10)
2 rows in set (14.80 sec)

Query OK, 0 rows affected (14.80 sec)

mysql&gt; call extended_procedure_analyse('test','projects_isam')\G
*************************** 1. row ***************************
Total_number_of_rows: 1000000
1 row in set (0.56 sec)

*************************** 1. row ***************************
             Field_name: id
              Min_value: 1
              Max_value: 1000000
             Min_length: 1
             Max_length: 7
       Empties_or_zeros: 0
                  Nulls: 0
Avg_value_or_avg_length: 500000
                    Std: 577358
      Optimal_fieldtype: MEDIUMINT(7) UNSIGNED NOT NULL
       Actual_fieldtype: int(10) unsigned
*************************** 2. row ***************************
             Field_name: name
              Min_value: 0
              Max_value: 9999
             Min_length: 1
             Max_length: 5
       Empties_or_zeros: 0
                  Nulls: 0
Avg_value_or_avg_length: 4.6605
                    Std: NULL
      Optimal_fieldtype: CHAR(5) NOT NULL
       Actual_fieldtype: char(10)
2 rows in set (0.56 sec)

Query OK, 0 rows affected (0.56 sec)</pre>
<p>Notice the difference in response time between the Innodb and MyISAM (yes, I wrongly used the &#8216;isam&#8217; name &#8230;) tables, that&#8217;s because Innodb has to calculate the number of rows for a count(*) query, while MyISAM stores a row count in the table.</p>
<p>I plan to extend this procedure to include index information, with useful data such as overindexed columns (columns that are included as a leftmost prefix in more than one index) and unindexed columns that are queried. The first question is easily answered from information_schema.statistics, using the seq_in_index column, I&#8217;m working on the second one.</p>
<p>Still, if you have the time and interest, play with this early version and let me know what&#8217;s wrong and/or could be improved with it.</p>
<div class='yarpp-related-rss yarpp-related-none'>
<p>No related posts.</p>
</div>
]]></content:encoded>
			<wfw:commentRss>http://fernandoipar.com/2009/04/18/making-use-of-procedure-analyse/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
	</channel>
</rss>
