By fernando | April 21, 2009

Extending procedure_analyse

My previous post explored a stored procedure that extended procedure_analyse with the intent of helping DBAs optimize table structure.

Here’s an improved version. I’ve followed Arjen Lentz’s suggestion and added support for the max_elements and max_memory parameters.

I also added a new Indexed column to the output, which is an ENUM(‘No’,'Yes’,'Overindexed’). Yes and No are self-explanatory, while Overindexed means the column is present as the left-most part of more than one index. This is useless, it just presents a performance penalty for MySQL (it needs to update more indexes) and if, for instance, you have columns A and B, and you have KEY(A) and KEY (A,B), mysql can use the second index to search for A alone too.

Here’s the updated version:

/*

extended procedure analyse
(C) 2009 Fernando Ipar
mail(at)fernandoipar.com

GPLv2

*/
drop procedure if exists extended_procedure_analyse;
delimiter //
create procedure extended_procedure_analyse(databaseName varchar(64), tableName varchar(64), max_elements int, max_memory int)
begin

	drop temporary table if exists procedure_analyse_output;
	drop temporary table if exists tmp_pao;

	create temporary table procedure_analyse_output
	(
	Field_name varchar(64),
	Min_value int,
	Max_value int,
	Min_length int,
	Max_length int,
	Empties_or_zeros int,
	Nulls int,
	Avg_value_or_avg_length float,
	Std float,
	Optimal_fieldtype text,
	Actual_fieldtype text,
	Indexed enum ('No','Yes','Overindexed') default 'No'
	);	

	set @table = concat(databaseName,'.',tableName);
	set @dbName = databaseName;
	set @tbName = tableName;
	set @maxEle = max_elements;
	set @maxMem = max_memory;

	set @qry = concat('insert into procedure_analyse_output (Field_name,Min_value,Max_value,Min_length,Max_length,Empties_or_zeros,Nulls,Avg_Value_or_avg_length,Std,Optimal_fieldtype) select * from ', @table,' procedure analyse(',@maxEle,',',@maxMem,')');
	prepare myStmt from @qry;
	execute myStmt;

	update procedure_analyse_output set Field_name = replace(Field_name,  CONCAT(databaseName,'.',tableName,'.'),'');

	prepare myStmt from 'update procedure_analyse_output pao, information_schema.columns c set pao.Actual_Fieldtype = c.column_type where table_schema = ? and table_name = ? and column_name = pao.Field_name';
	execute myStmt using @dbName,@tbName;

	set @qry = concat('select count(*) as `Total_number_of_rows` from ',@table);
	prepare myStmt from @qry;

	execute myStmt;

	set @qry = concat('update procedure_analyse_output pao set Indexed = "Yes\" where exists (select 1 from information_schema.statistics where table_schema = ? and table_name = ? and column_name = pao.Field_name)');
	prepare myStmt from @qry;
	execute myStmt using @dbName,@tbName;

	create temporary table tmp_pao as select * from procedure_analyse_output;
	prepare myStmt from 'update tmp_pao set Indexed = "Overindexed" where exists (select Field_name,count(*) from procedure_analyse_output pao inner join information_schema.statistics s on pao.Field_name = s.column_name where table_schema = ? and table_name = ? and seq_in_index = 1 and pao.Field_name = tmp_pao.Field_name group by Field_name having count(*) > 1)';
	execute myStmt  using @dbName,@tbName;

	select * from tmp_pao;

	drop temporary table procedure_analyse_output;
	drop temporary table tmp_pao; 

end;
//
delimiter ;

And here’s a sample output:

mysql> call extended_procedure_analyse('test','Account',4,100)\G
*************************** 1. row ***************************
Total_number_of_rows: 1
1 row in set (0.04 sec)

*************************** 1. row ***************************
             Field_name: InternalAID
              Min_value: 2147483647
              Max_value: 2147483647
             Min_length: 18
             Max_length: 18
       Empties_or_zeros: 0
                  Nulls: 0
Avg_value_or_avg_length: 1.65632e+17
                    Std: 0
      Optimal_fieldtype: BIGINT(18) UNSIGNED NOT NULL
       Actual_fieldtype: bigint(20)
                Indexed: Yes
*************************** 2. row ***************************
             Field_name: accountID
              Min_value: 12
              Max_value: 12
             Min_length: 2
             Max_length: 2
       Empties_or_zeros: 0
                  Nulls: 0
Avg_value_or_avg_length: 2
                    Std: NULL
      Optimal_fieldtype: ENUM('12') NOT NULL
       Actual_fieldtype: varchar(255)
                Indexed: Overindexed
*************************** 3. row ***************************
             Field_name: acctBalance
              Min_value: 2147483647
              Max_value: 2147483647
             Min_length: 11
             Max_length: 11
       Empties_or_zeros: 0
                  Nulls: 0
Avg_value_or_avg_length: 1.00051e+20
                    Std: 0
      Optimal_fieldtype: BIGINT(11) UNSIGNED NOT NULL
       Actual_fieldtype: double
                Indexed: Yes
*************************** 4. row ***************************
             Field_name: ownerID
              Min_value: 2147483647
              Max_value: 2147483647
             Min_length: 19
             Max_length: 19
       Empties_or_zeros: 0
                  Nulls: 0
Avg_value_or_avg_length: 6.57541e+18
                    Std: 0
      Optimal_fieldtype: BIGINT(19) UNSIGNED NOT NULL
       Actual_fieldtype: bigint(20)
                Indexed: No
4 rows in set (0.24 sec)

Query OK, 0 rows affected, 8 warnings (0.24 sec)

There’s a hg repository to handle the project. We’re working with @fedesilva to create a standalone java app that will present this and more info in a friendly manner to sysadmins, and to handle the creation and destruction of the stored procedure automatically.

Share and Enjoy: These icons link to social bookmarking sites where readers can share and discover new web pages.
  • MisterWong
  • Y!GG
  • Webnews
  • Digg
  • del.icio.us
  • StumbleUpon
  • Reddit
  • email
  • Facebook
  • LinkedIn
  • Technorati

Related posts:

  1. Making use of procedure analyse() SELECT Field0[,Field1,Field2,...] FROM TABLE PROCEDURE ANALYSE() is a nice tool...

Related posts brought to you by Yet Another Related Posts Plugin.

4 comments | Add One

  1. Shlomi Noach - 04/22/2009 at 1:22 am

    Good work.

    With regard to overused indexes, you should note that an index on (a,b) and on (a,c) have nothing useless between them. But your code will report “a” as being over indexed.

  2. Fernando Ipar - 04/22/2009 at 1:49 am

    Thanks a lot for the observation, that didn’t occur to me :)

    I’ll have to look just for columns that have count(*) > 1 and that are also indexed alone, this would be the useless index (of course, it might be that one index takes duplicates, the other not, I don’t know if that still makes all cases useless, you’ve really opened a can of worms for me :)

    Fortunately we’ve got the thing at bitbucket now so I’ll report this as a bug and begin working on it ASAP!

  3. Shlomi Noach - 04/22/2009 at 1:58 am

    Hi,
    Still not complete. If the the index on (a) is UNIQUE, then an index (a,b) does make it redundant.
    Since there’s already a utility which checks for duplicate indexes, and it’s quite complicated, I suggest you take a look how it does it. It is part of maatkit, and is called mk-duplicate-key-checker. See http://code.google.com/p/maatkit/wiki/DeterminingDuplicateKeys

  4. Fernando Ipar - 04/22/2009 at 2:33 am

    Thanks for the additional suggestion, as I said, I expected that kind of problems. I’ll look into that particular maatkit script for ideas, I don’t intend to reinvent the wheel but rather to present DBAs with some useful screenshot of information, but it has to be useful and properly obtained :)

Leave a Comment

Name:

E-Mail :

Website :

Comments :