Making use of procedure analyse()

SELECT Field0[,Field1,Field2,...] FROM TABLE PROCEDURE ANALYSE() is a nice tool to find out more about your table’s columns.

Still, it could be improved in a lot of ways, and the stored procedure below is a starting point. It makes use of procedure analyse (though with ‘SELECT * FROM’), and modifies it’s output to include the actual column datatype and the total number of rows of the table.

The actual datatype is a piece of information I’ve seen a lot of people request, and the number of rows is, I think, a critical piece of information to determine if the output of procedure analyse is credible or not. It’s not the same thing to take suggestions from mysql on a table with 7 or 20 rows than from a table with 1000000 rows. Of course, remember than numbers alone mean nothing, you might just have 7 rows in a table that represent the entire universe of possible values for your problem domain. In any case, a smart human being with a good toolset is the best way to solve problems!

So here’s the procedure, which can also be downloaded from this link:

/*

extended procedure analyse
(C) 2009 Fernando Ipar
mail(at)fernandoipar.com

GPLv2

*/
drop procedure if exists extended_procedure_analyse;
delimiter //
create procedure extended_procedure_analyse(databaseName varchar(64), tableName varchar(64))
begin

	create temporary table procedure_analyse_output
	(
	Field_name varchar(64),
	Min_value int,
	Max_value int,
	Min_length int,
	Max_length int,
	Empties_or_zeros int,
	Nulls int,
	Avg_value_or_avg_length float,
	Std float,
	Optimal_fieldtype text,
	Actual_fieldtype text
	);	

	set @table = concat(databaseName,'.',tableName);
	set @dbName = databaseName;
	set @tbName = tableName;

	set @qry = concat('insert into procedure_analyse_output (Field_name,Min_value,Max_value,Min_length,Max_length,Empties_or_zeros,Nulls,Avg_Value_or_avg_length,Std,Optimal_fieldtype) select * from ', @table,' procedure analyse()');
	prepare myStmt from @qry;
	execute myStmt;

	update procedure_analyse_output set Field_name = replace(Field_name,  CONCAT(databaseName,'.',tableName,'.'),'');

	prepare myStmt from 'update procedure_analyse_output pao, information_schema.columns c set pao.Actual_Fieldtype = c.column_type where table_schema = ? and table_name = ? and column_name = pao.Field_name';
	execute myStmt using @dbName,@tbName;

	set @qry = concat('select count(*) as `Total_number_of_rows` from ',@table);
	prepare myStmt from @qry;

	execute myStmt;

	select * from procedure_analyse_output;

	drop temporary table procedure_analyse_output;

end;
//
delimiter ;

Here are a couple of sample outputs:

mysql> call extended_procedure_analyse('test','City')\G
*************************** 1. row ***************************
Total_number_of_rows: 30000
1 row in set (0.11 sec)

*************************** 1. row ***************************
             Field_name: ID
              Min_value: 925001
              Max_value: 955000
             Min_length: 6
             Max_length: 6
       Empties_or_zeros: 0
                  Nulls: 0
Avg_value_or_avg_length: 940000
                    Std: 938839
      Optimal_fieldtype: MEDIUMINT(6) UNSIGNED NOT NULL
       Actual_fieldtype: int(11) unsigned
*************************** 2. row ***************************
             Field_name: CountryCode
              Min_value: 0
              Max_value: 29
             Min_length: 1
             Max_length: 2
       Empties_or_zeros: 1000
                  Nulls: 0
Avg_value_or_avg_length: 14.5
                    Std: 8.6554
      Optimal_fieldtype: ENUM('0','1','2','3','4','5','6','7','8','9','10','11','12','13','14','15','16','17','18','19','20','21','22','23','24','25','26','27','28','29') NOT NULL
       Actual_fieldtype: int(11) unsigned
*************************** 3. row ***************************
             Field_name: Name
              Min_value: 1
              Max_value: 9999
             Min_length: 1
             Max_length: 5
       Empties_or_zeros: 0
                  Nulls: 0
Avg_value_or_avg_length: 4.6605
                    Std: NULL
      Optimal_fieldtype: CHAR(5) NOT NULL
       Actual_fieldtype: varchar(40)
*************************** 4. row ***************************
             Field_name: District
              Min_value: 1
              Max_value: 9999
             Min_length: 1
             Max_length: 5
       Empties_or_zeros: 0
                  Nulls: 0
Avg_value_or_avg_length: 4.6603
                    Std: NULL
      Optimal_fieldtype: CHAR(5) NOT NULL
       Actual_fieldtype: varchar(40)
*************************** 5. row ***************************
             Field_name: Population
              Min_value: 0
              Max_value: 9999
             Min_length: 1
             Max_length: 5
       Empties_or_zeros: 0
                  Nulls: 0
Avg_value_or_avg_length: 4.6647
                    Std: NULL
      Optimal_fieldtype: CHAR(5) NOT NULL
       Actual_fieldtype: varchar(40)
5 rows in set (0.12 sec)

Query OK, 0 rows affected (0.12 sec)

mysql> call extended_procedure_analyse('test','projects_innodb')\G
*************************** 1. row ***************************
Total_number_of_rows: 1007366
1 row in set (14.80 sec)

*************************** 1. row ***************************
             Field_name: id
              Min_value: 1
              Max_value: 1007366
             Min_length: 1
             Max_length: 7
       Empties_or_zeros: 0
                  Nulls: 0
Avg_value_or_avg_length: 503684
                    Std: 581599
      Optimal_fieldtype: MEDIUMINT(7) UNSIGNED NOT NULL
       Actual_fieldtype: int(10) unsigned
*************************** 2. row ***************************
             Field_name: name
              Min_value: 0
              Max_value: 9999
             Min_length: 1
             Max_length: 10
       Empties_or_zeros: 0
                  Nulls: 0
Avg_value_or_avg_length: 4.6958
                    Std: NULL
      Optimal_fieldtype: VARCHAR(10) NOT NULL
       Actual_fieldtype: char(10)
2 rows in set (14.80 sec)

Query OK, 0 rows affected (14.80 sec)

mysql> call extended_procedure_analyse('test','projects_isam')\G
*************************** 1. row ***************************
Total_number_of_rows: 1000000
1 row in set (0.56 sec)

*************************** 1. row ***************************
             Field_name: id
              Min_value: 1
              Max_value: 1000000
             Min_length: 1
             Max_length: 7
       Empties_or_zeros: 0
                  Nulls: 0
Avg_value_or_avg_length: 500000
                    Std: 577358
      Optimal_fieldtype: MEDIUMINT(7) UNSIGNED NOT NULL
       Actual_fieldtype: int(10) unsigned
*************************** 2. row ***************************
             Field_name: name
              Min_value: 0
              Max_value: 9999
             Min_length: 1
             Max_length: 5
       Empties_or_zeros: 0
                  Nulls: 0
Avg_value_or_avg_length: 4.6605
                    Std: NULL
      Optimal_fieldtype: CHAR(5) NOT NULL
       Actual_fieldtype: char(10)
2 rows in set (0.56 sec)

Query OK, 0 rows affected (0.56 sec)

Notice the difference in response time between the Innodb and MyISAM (yes, I wrongly used the ‘isam’ name …) tables, that’s because Innodb has to calculate the number of rows for a count(*) query, while MyISAM stores a row count in the table.

I plan to extend this procedure to include index information, with useful data such as overindexed columns (columns that are included as a leftmost prefix in more than one index) and unindexed columns that are queried. The first question is easily answered from information_schema.statistics, using the seq_in_index column, I’m working on the second one.

Still, if you have the time and interest, play with this early version and let me know what’s wrong and/or could be improved with it.

Share and Enjoy: These icons link to social bookmarking sites where readers can share and discover new web pages.
  • MisterWong
  • Y!GG
  • Webnews
  • Digg
  • del.icio.us
  • StumbleUpon
  • Reddit
  • email
  • Facebook
  • LinkedIn
  • Technorati

9 thoughts on “Making use of procedure analyse()

  1. Thanks for all the comments!

    @Arjen: I’ll add these limits to my todo. I followed your link and couldn’t help to laugh at the first reply you got. It’s no wonder there are so many MySQL forks out there.

  2. Hi Fernando,
    This seems like a great tool but I’m getting an error -
    ERROR 1221 (HY000): Incorrect usage of PROCEDURE and non-SELECT
    I’m on the latest 5.6.x, maybe that’s got something to do with it.
    Any ideas?
    Thanks

  3. Hello Alex,

    You’re right, this stopped working on 5.6 Not just the stored procedure, but apparently any use of procedure analyse() other than a select.

    So far I have only found a changelog mention for 5.6.6 about this, but I tired 5.6.5 and this did not work either so the change must have happened before.

    I’ll dig around a bit to see what I can find.

  4. mysql> insert into procedure_analyse_output select * from pet procedure analyse()
    -> ;
    ERROR 1221 (HY000): Incorrect usage of PROCEDURE and non-SELECT

  5. hello fernando,
    it can work this way:
    mysql> select * from pet procedure analyse() into OUTFILE “ana.txt”;
    mysql> load data infile “ana.txt” into procedure_analyse_output;
    than you la

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>