SELECT Field0[,Field1,Field2,...] FROM TABLE PROCEDURE ANALYSE() is a nice tool to find out more about your table's columns.
Still, it could be improved in a lot of ways, and the stored procedure below is a starting point. It makes use of procedure analyse (though with 'SELECT * FROM'), and modifies it's output to include the actual column datatype and the total number of rows of the table.
The actual datatype is a piece of information I've seen a lot of people request, and the number of rows is, I think, a critical piece of information to determine if the output of procedure analyse is credible or not. It's not the same thing to take suggestions from mysql on a table with 7 or 20 rows than from a table with 1000000 rows. Of course, remember than numbers alone mean nothing, you might just have 7 rows in a table that represent the entire universe of possible values for your problem domain. In any case, a smart human being with a good toolset is the best way to solve problems!
So here's the procedure, which can also be downloaded from this link:
/* extended procedure analyse (C) 2009 Fernando Ipar mail(at)fernandoipar.com GPLv2 */ drop procedure if exists extended_procedure_analyse; delimiter // create procedure extended_procedure_analyse(databaseName varchar(64), tableName varchar(64)) begin create temporary table procedure_analyse_output ( Field_name varchar(64), Min_value int, Max_value int, Min_length int, Max_length int, Empties_or_zeros int, Nulls int, Avg_value_or_avg_length float, Std float, Optimal_fieldtype text, Actual_fieldtype text ); set @table = concat(databaseName,'.',tableName); set @dbName = databaseName; set @tbName = tableName; set @qry = concat('insert into procedure_analyse_output (Field_name,Min_value,Max_value,Min_length,Max_length,Empties_or_zeros,Nulls,Avg_Value_or_avg_length,Std,Optimal_fieldtype) select * from ', @table,' procedure analyse()'); prepare myStmt from @qry; execute myStmt; update procedure_analyse_output set Field_name = replace(Field_name, CONCAT(databaseName,'.',tableName,'.'),''); prepare myStmt from 'update procedure_analyse_output pao, information_schema.columns c set pao.Actual_Fieldtype = c.column_type where table_schema = ? and table_name = ? and column_name = pao.Field_name'; execute myStmt using @dbName,@tbName; set @qry = concat('select count(*) as `Total_number_of_rows` from ',@table); prepare myStmt from @qry; execute myStmt; select * from procedure_analyse_output; drop temporary table procedure_analyse_output; end; // delimiter ;
Here are a couple of sample outputs:
mysql> call extended_procedure_analyse('test','City')\G *************************** 1. row *************************** Total_number_of_rows: 30000 1 row in set (0.11 sec) *************************** 1. row *************************** Field_name: ID Min_value: 925001 Max_value: 955000 Min_length: 6 Max_length: 6 Empties_or_zeros: 0 Nulls: 0 Avg_value_or_avg_length: 940000 Std: 938839 Optimal_fieldtype: MEDIUMINT(6) UNSIGNED NOT NULL Actual_fieldtype: int(11) unsigned *************************** 2. row *************************** Field_name: CountryCode Min_value: 0 Max_value: 29 Min_length: 1 Max_length: 2 Empties_or_zeros: 1000 Nulls: 0 Avg_value_or_avg_length: 14.5 Std: 8.6554 Optimal_fieldtype: ENUM('0','1','2','3','4','5','6','7','8','9','10','11','12','13','14','15','16','17','18','19','20','21','22','23','24','25','26','27','28','29') NOT NULL Actual_fieldtype: int(11) unsigned *************************** 3. row *************************** Field_name: Name Min_value: 1 Max_value: 9999 Min_length: 1 Max_length: 5 Empties_or_zeros: 0 Nulls: 0 Avg_value_or_avg_length: 4.6605 Std: NULL Optimal_fieldtype: CHAR(5) NOT NULL Actual_fieldtype: varchar(40) *************************** 4. row *************************** Field_name: District Min_value: 1 Max_value: 9999 Min_length: 1 Max_length: 5 Empties_or_zeros: 0 Nulls: 0 Avg_value_or_avg_length: 4.6603 Std: NULL Optimal_fieldtype: CHAR(5) NOT NULL Actual_fieldtype: varchar(40) *************************** 5. row *************************** Field_name: Population Min_value: 0 Max_value: 9999 Min_length: 1 Max_length: 5 Empties_or_zeros: 0 Nulls: 0 Avg_value_or_avg_length: 4.6647 Std: NULL Optimal_fieldtype: CHAR(5) NOT NULL Actual_fieldtype: varchar(40) 5 rows in set (0.12 sec) Query OK, 0 rows affected (0.12 sec) mysql> call extended_procedure_analyse('test','projects_innodb')\G *************************** 1. row *************************** Total_number_of_rows: 1007366 1 row in set (14.80 sec) *************************** 1. row *************************** Field_name: id Min_value: 1 Max_value: 1007366 Min_length: 1 Max_length: 7 Empties_or_zeros: 0 Nulls: 0 Avg_value_or_avg_length: 503684 Std: 581599 Optimal_fieldtype: MEDIUMINT(7) UNSIGNED NOT NULL Actual_fieldtype: int(10) unsigned *************************** 2. row *************************** Field_name: name Min_value: 0 Max_value: 9999 Min_length: 1 Max_length: 10 Empties_or_zeros: 0 Nulls: 0 Avg_value_or_avg_length: 4.6958 Std: NULL Optimal_fieldtype: VARCHAR(10) NOT NULL Actual_fieldtype: char(10) 2 rows in set (14.80 sec) Query OK, 0 rows affected (14.80 sec) mysql> call extended_procedure_analyse('test','projects_isam')\G *************************** 1. row *************************** Total_number_of_rows: 1000000 1 row in set (0.56 sec) *************************** 1. row *************************** Field_name: id Min_value: 1 Max_value: 1000000 Min_length: 1 Max_length: 7 Empties_or_zeros: 0 Nulls: 0 Avg_value_or_avg_length: 500000 Std: 577358 Optimal_fieldtype: MEDIUMINT(7) UNSIGNED NOT NULL Actual_fieldtype: int(10) unsigned *************************** 2. row *************************** Field_name: name Min_value: 0 Max_value: 9999 Min_length: 1 Max_length: 5 Empties_or_zeros: 0 Nulls: 0 Avg_value_or_avg_length: 4.6605 Std: NULL Optimal_fieldtype: CHAR(5) NOT NULL Actual_fieldtype: char(10) 2 rows in set (0.56 sec) Query OK, 0 rows affected (0.56 sec)
Notice the difference in response time between the Innodb and MyISAM (yes, I wrongly used the 'isam' name ...) tables, that's because Innodb has to calculate the number of rows for a count(*) query, while MyISAM stores a row count in the table.
I plan to extend this procedure to include index information, with useful data such as overindexed columns (columns that are included as a leftmost prefix in more than one index) and unindexed columns that are queried. The first question is easily answered from information_schema.statistics, using the seq_in_index column, I'm working on the second one.
Still, if you have the time and interest, play with this early version and let me know what's wrong and/or could be improved with it.