Generating data with dbmonster

In my last post I included some sample data which was useful for playing around with queries (once I published it,
I realized it made my post look like some form of keyword stuffing, fortunately I don’t use adsense on my site so I hope I’m free of any suspicion :D).

That sample data was generated with dbmonster, a nice tool I discovered recently which comes in handy when you need to populate a database table to test your queries, the engine, schema, etc.

In a nutshell, here are the strengths I like about it:

  • It’s written in Java, so it’ll feed any database with a JDBC driver, and AFAIK, that’s just about any RDBMS you might need to use
  • It’s not too slow (it’s actually pretty fast if you don’t go crazy on long text based datatypes)
  • It includes several data generators, and being written in Java, rolling out my own should be easy.

As for the weaknesses:

  • It can be unbearably slow with long text columns
  • The generated text data might not represent a very real installation (again, look at my previous post for a quick sample). Still, the fact that you can easily write and plug your own generator(s) is enough to give kudos to the developers for a good architecture. Even if a feature isn’t available now, it’s painless to add in the future
  • It has no way to manage relational integrity, hence, it doesn’t really feed RDBMS, just JDBC enabled databases (yes it does, please see the comments)

My last point means that, if you have table A and table B, both related through a foreign key constraint, dbmonster doesn’t provide a way for you to generate keys for table A and then reuse these keys
while inserting the foreign keys in table B. Not that I’ve found so far, at least.

Related with this limitation, tables are generated one at the time. Still, if you need a tool to populate tables, even very large ones, with random values, heck, up until a few days ago, I was doing this from
bash or python, so it’s definitely an improvement. I’ve discussed this with colleagues and some proprietary tools are either available or in the works that would handle referential integrity while
generating data for database testing. If anyone knows an Open Source one that does, it’d be great. Otherwise, that would sure make one hell of a good project :)

Ok, so enough rant, here’s some data: You get it from here.

Start off by creating a dbmonster.properties file. Here’s a sample:

dbmonster.jdbc.driver=com.mysql.jdbc.Driver
dbmonster.jdbc.url=jdbc:mysql://localhost/sample_db
dbmonster.jdbc.username=mysql_user
dbmonster.jdbc.password=mysql_password

# for Oracle and other schema enabled databases
#dbmonster.jdbc.schema=schema_name

# maximal number of (re)tries
dbmonster.max-tries=50

# default rows number for SchemaGrabber
dbmonster.rows=1000

# progres monitor class
dbmonster.progress.monitor=pl.kernelpanic.dbmonster.ProgressMonitorAdapter

Dbmonster is now ready to reach your database.
Before it’s ready to feed it, though, you need to create an xml file (ah, yes, the plague of the Java world, and I’ve been a Java guy for most of my working years…). Call it what you want,
being a consistent kind type of man, I’ve been using .xml. Here’s my sample_db.xml file:

<?xml version="1.0"?>
<!DOCTYPE dbmonster-schema PUBLIC "-//kernelpanic.pl//DBMonster Database Schema DTD 1.1//EN" "http://dbmonster.kernelpanic.pl/dtd/dbmonster-schema-1.1.dtd">
<dbmonster-schema>
  <name>Sample DB</name>
  <table name="jobs" rows="150000">
    <key databaseDefault="true">
      <generator type="pl.kernelpanic.dbmonster.generator.MaxKeyGenerator">
        <property name="columnName" value="id"/>
      </generator>
    </key>
    <column name="title" databaseDefault="false">
      <generator type="pl.kernelpanic.dbmonster.generator.StringGenerator">
        <property name="allowSpaces" value="true"/>
        <property name="excludeChars" value=""/>
        <property name="maxLength" value="250"/>
        <property name="minLength" value="15"/>
        <property name="nulls" value="0"/>
      </generator>
    </column>
    <column name="link" databaseDefault="false">
      <generator type="pl.kernelpanic.dbmonster.generator.StringGenerator">
        <property name="allowSpaces" value="true"/>
        <property name="excludeChars" value=""/>
        <property name="maxLength" value="65500"/>
        <property name="minLength" value="150"/>
        <property name="nulls" value="0"/>
      </generator>
    </column>
    <column name="description" databaseDefault="false">
      <generator type="pl.kernelpanic.dbmonster.generator.StringGenerator">
        <property name="allowSpaces" value="true"/>
        <property name="excludeChars" value=""/>
        <property name="maxLength" value="65500"/>
        <property name="minLength" value="150"/>
        <property name="nulls" value="0"/>
      </generator>
    </column>
    <column name="city" databaseDefault="false">
      <generator type="pl.kernelpanic.dbmonster.generator.StringGenerator">
        <property name="allowSpaces" value="true"/>
        <property name="excludeChars" value=""/>
        <property name="maxLength" value="250"/>
        <property name="minLength" value="15"/>
        <property name="nulls" value="0"/>
      </generator>
    </column>
    <column name="postdate" databaseDefault="false">
      <generator type="pl.kernelpanic.dbmonster.generator.DateTimeGenerator">
        <property name="nulls" value="0"/>
      </generator>
    </column>
    <column name="company_id" databaseDefault="false">
      <generator type="pl.kernelpanic.dbmonster.generator.NumberGenerator">
        <property name="nulls" value="0"/>
      </generator>
    </column>
    <column name="country_id" databaseDefault="false">
      <generator type="pl.kernelpanic.dbmonster.generator.NumberGenerator">
        <property name="nulls" value="0"/>
      </generator>
    </column>
  </table>
</dbmonster-schema>

The file is pretty self explanatory. However, here are some pointers:

  • You have a table tag for every generated table
  • A column tag for every column of every table (duh)
  • Column’s have generators. These are the Java classes that actually generate the value to insert into this column. Notice how you can specify the databasedefault=”true” attribute, which makes dbmonster omit generation for this column (good for auto increment columns / postgres sequences

Now let’s run it:

export CLASSPATH=dbmonster*.jar:mysql*jar

java pl.kernelpanic.dbmonster.Launcher -s sample_db.xml

fipar@telecaster:~/soft/dbmonster-core-1.0.3$ java -classpath mysql-connector-java-5.1.6-bin.jar:dbmonster-core-1.0.3.jar pl.kernelpanic.dbmonster.Launcher -s sample_db.xml
2009-08-05 13:43:27,244 INFO  DBMonster - Let's feed this hungry database.
2009-08-05 13:43:27,697 INFO  DBCPConnectionProvider - Today we are feeding: MySQL 5.0.75-0ubuntu10.2-log
2009-08-05 13:43:27,783 INFO  Schema - Generating schema .
2009-08-05 13:43:27,783 INFO  Table - Generating table .
2009-08-05 13:44:18,749 INFO  Table - Generation of table  finished.
2009-08-05 13:44:18,750 INFO  Schema - Generation of schema  finished.
2009-08-05 13:44:18,750 INFO  DBMonster - Finished in 51 sec. 507 ms.
fipar@telecaster:~/soft/dbmonster-core-1.0.3$ mysql -p -e 'select count(*) from tasks' sample_db
Enter password:
+----------+
| count(*) |
+----------+
|    80000 |
+----------+


fipar@telecaster:~/soft/dbmonster-core-1.0.3$ java -cp dbmonster-core-1.0.3.jar:mysql-connector-java-5.1.6-bin.jar pl.kernelpanic.dbmonster.Launcher -s sample_db.xml
2009-08-06 10:27:24,747 INFO  DBMonster - Let's feed this hungry database.
2009-08-06 10:27:25,336 INFO  DBCPConnectionProvider - Today we are feeding: MySQL 5.0.75-0ubuntu10.2-log
2009-08-06 10:27:25,490 INFO  Schema - Generating schema .
2009-08-06 10:27:25,490 INFO  Table - Generating table .
2009-08-06 10:37:14,284 INFO  Table - Generation of table  finished.
2009-08-06 10:37:14,284 INFO  Schema - Generation of schema  finished.
2009-08-06 10:37:14,285 INFO  DBMonster - Finished in 9 min. 49 sec. 539 ms.
fipar@telecaster:~/soft/dbmonster-core-1.0.3$ mysql -p -e 'select count(*) from tasks' sample_db
Enter password:
+----------+
| count(*) |
+----------+
|   150000 |
+----------+

Besides the obvious difference in the number of rows, the second generation had larger text values (char(250) vs char(40)). You can see how that affected generation time. It should have finished (proportionally) in
about 1:50, yet it took dbmonser almost 10 minutes to generate this data.

In conclusion, depending on the size of the database you need to generate, and the use you intend for it (forget testing a huge schema with referential constraints), dbmonster can certainly aid
you in stress testing any database engine.

Share and Enjoy: These icons link to social bookmarking sites where readers can share and discover new web pages.
  • MisterWong
  • Y!GG
  • Webnews
  • Digg
  • del.icio.us
  • StumbleUpon
  • Reddit
  • email
  • Facebook
  • LinkedIn
  • Technorati

4 thoughts on “Generating data with dbmonster

  1. Hey, no need to be sorry, that’s good news :)
    My mistake was going straight to the configuration section of the manual. Since foreign keys are handles automatically according to this: http://dbmonster.kernelpanic.pl/manual/intro.html, there’s no configuration key/option for them.
    I’ll run new tests taking this into consideration and either update this post or write a new one.

    Thanks for the info and the link!

  2. Indeed, I ran another test and it works exactly as described:

    mysql -e ‘create database rtest’
    [… create some inodb tables on rtest, with foreign key constraints … ]

    java pl.kernelpanic.dbmonster.Launcher -c rtest.properties –grab > rtest.xml

    After properly editing the begining of the rtest.xml file (since it will included some non-xml output from dbmonster):

    java pl.kernelpanic.dbmonster.Launcher -c rtest.properties -s rtest.xml

    This successfully generates rows for all tables, satisfying referential integrity :)

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>