SELECT Field0[,Field1,Field2,...] FROM TABLE PROCEDURE ANALYSE() is a nice tool to find out more about your table's columns.
Still, it could be improved in a lot of ways, and the stored procedure below is a starting point. It makes use of procedure analyse (though with 'SELECT * FROM'), and modifies it's output to include the actual column datatype and the total number of rows of the table.
The actual datatype is a piece of information I've seen a lot of people request, and the number of rows is, I think, a critical piece of information to determine if the output of procedure analyse is credible or not. It's not the same thing to take suggestions from mysql on a table with 7 or 20 rows than from a table with 1000000 rows. Of course, remember than numbers alone mean nothing, you might just have 7 rows in a table that represent the entire universe of possible values for your problem domain. In any case, a smart human being with a good toolset is the best way to solve problems!
So here's the procedure, which can also be downloaded from this link:
/*
extended procedure analyse
(C) 2009 Fernando Ipar
mail(at)fernandoipar.com
GPLv2
*/
drop procedure if exists extended_procedure_analyse;
delimiter //
create procedure extended_procedure_analyse(databaseName varchar(64), tableName varchar(64))
begin
create temporary table procedure_analyse_output
(
Field_name varchar(64),
Min_value int,
Max_value int,
Min_length int,
Max_length int,
Empties_or_zeros int,
Nulls int,
Avg_value_or_avg_length float,
Std float,
Optimal_fieldtype text,
Actual_fieldtype text
);
set @table = concat(databaseName,'.',tableName);
set @dbName = databaseName;
set @tbName = tableName;
set @qry = concat('insert into procedure_analyse_output (Field_name,Min_value,Max_value,Min_length,Max_length,Empties_or_zeros,Nulls,Avg_Value_or_avg_length,Std,Optimal_fieldtype) select * from ', @table,' procedure analyse()');
prepare myStmt from @qry;
execute myStmt;
update procedure_analyse_output set Field_name = replace(Field_name, CONCAT(databaseName,'.',tableName,'.'),'');
prepare myStmt from 'update procedure_analyse_output pao, information_schema.columns c set pao.Actual_Fieldtype = c.column_type where table_schema = ? and table_name = ? and column_name = pao.Field_name';
execute myStmt using @dbName,@tbName;
set @qry = concat('select count(*) as `Total_number_of_rows` from ',@table);
prepare myStmt from @qry;
execute myStmt;
select * from procedure_analyse_output;
drop temporary table procedure_analyse_output;
end;
//
delimiter ;
Here are a couple of sample outputs:
mysql> call extended_procedure_analyse('test','City')\G
*************************** 1. row ***************************
Total_number_of_rows: 30000
1 row in set (0.11 sec)
*************************** 1. row ***************************
Field_name: ID
Min_value: 925001
Max_value: 955000
Min_length: 6
Max_length: 6
Empties_or_zeros: 0
Nulls: 0
Avg_value_or_avg_length: 940000
Std: 938839
Optimal_fieldtype: MEDIUMINT(6) UNSIGNED NOT NULL
Actual_fieldtype: int(11) unsigned
*************************** 2. row ***************************
Field_name: CountryCode
Min_value: 0
Max_value: 29
Min_length: 1
Max_length: 2
Empties_or_zeros: 1000
Nulls: 0
Avg_value_or_avg_length: 14.5
Std: 8.6554
Optimal_fieldtype: ENUM('0','1','2','3','4','5','6','7','8','9','10','11','12','13','14','15','16','17','18','19','20','21','22','23','24','25','26','27','28','29') NOT NULL
Actual_fieldtype: int(11) unsigned
*************************** 3. row ***************************
Field_name: Name
Min_value: 1
Max_value: 9999
Min_length: 1
Max_length: 5
Empties_or_zeros: 0
Nulls: 0
Avg_value_or_avg_length: 4.6605
Std: NULL
Optimal_fieldtype: CHAR(5) NOT NULL
Actual_fieldtype: varchar(40)
*************************** 4. row ***************************
Field_name: District
Min_value: 1
Max_value: 9999
Min_length: 1
Max_length: 5
Empties_or_zeros: 0
Nulls: 0
Avg_value_or_avg_length: 4.6603
Std: NULL
Optimal_fieldtype: CHAR(5) NOT NULL
Actual_fieldtype: varchar(40)
*************************** 5. row ***************************
Field_name: Population
Min_value: 0
Max_value: 9999
Min_length: 1
Max_length: 5
Empties_or_zeros: 0
Nulls: 0
Avg_value_or_avg_length: 4.6647
Std: NULL
Optimal_fieldtype: CHAR(5) NOT NULL
Actual_fieldtype: varchar(40)
5 rows in set (0.12 sec)
Query OK, 0 rows affected (0.12 sec)
mysql> call extended_procedure_analyse('test','projects_innodb')\G
*************************** 1. row ***************************
Total_number_of_rows: 1007366
1 row in set (14.80 sec)
*************************** 1. row ***************************
Field_name: id
Min_value: 1
Max_value: 1007366
Min_length: 1
Max_length: 7
Empties_or_zeros: 0
Nulls: 0
Avg_value_or_avg_length: 503684
Std: 581599
Optimal_fieldtype: MEDIUMINT(7) UNSIGNED NOT NULL
Actual_fieldtype: int(10) unsigned
*************************** 2. row ***************************
Field_name: name
Min_value: 0
Max_value: 9999
Min_length: 1
Max_length: 10
Empties_or_zeros: 0
Nulls: 0
Avg_value_or_avg_length: 4.6958
Std: NULL
Optimal_fieldtype: VARCHAR(10) NOT NULL
Actual_fieldtype: char(10)
2 rows in set (14.80 sec)
Query OK, 0 rows affected (14.80 sec)
mysql> call extended_procedure_analyse('test','projects_isam')\G
*************************** 1. row ***************************
Total_number_of_rows: 1000000
1 row in set (0.56 sec)
*************************** 1. row ***************************
Field_name: id
Min_value: 1
Max_value: 1000000
Min_length: 1
Max_length: 7
Empties_or_zeros: 0
Nulls: 0
Avg_value_or_avg_length: 500000
Std: 577358
Optimal_fieldtype: MEDIUMINT(7) UNSIGNED NOT NULL
Actual_fieldtype: int(10) unsigned
*************************** 2. row ***************************
Field_name: name
Min_value: 0
Max_value: 9999
Min_length: 1
Max_length: 5
Empties_or_zeros: 0
Nulls: 0
Avg_value_or_avg_length: 4.6605
Std: NULL
Optimal_fieldtype: CHAR(5) NOT NULL
Actual_fieldtype: char(10)
2 rows in set (0.56 sec)
Query OK, 0 rows affected (0.56 sec)
Notice the difference in response time between the Innodb and MyISAM (yes, I wrongly used the 'isam' name ...) tables, that's because Innodb has to calculate the number of rows for a count(*) query, while MyISAM stores a row count in the table.
I plan to extend this procedure to include index information, with useful data such as overindexed columns (columns that are included as a leftmost prefix in more than one index) and unindexed columns that are queried. The first question is easily answered from information_schema.statistics, using the seq_in_index column, I'm working on the second one.
Still, if you have the time and interest, play with this early version and let me know what's wrong and/or could be improved with it.