read

MySQL provides a SOUNDEX() function, which returns the soundex of a given string. For details, refer to the manual, but to put it simply, it allows you to compare strings based on how they sound, hence letting you do proximity searches on your database.

If you're just querying for a word, it's usage is pretty straightforward, and in fact, you can use the SOUNDS LIKE operator to build expressions such as this:

SELECT expr FROM table expr WHERE field0 SOUNDS LIKE 'inputWord'

However, if you're storing multiple-word strings, things get a little more complicated, since they can't be compared by their soundex. Rather, the soundex returned will be associated with the whole phrase. If at a later time, you want to search for a subpart of this phrase, there's no way for you to do this.

Well, at least not directly, but by using an auxiliary table to store soundex strings, a couple of stored procedures, and a trigger, it can be done with little effort to programmers that use the database.

Let's assume we have a very simple table called soundex_text with the following structure:

create table if not exists soundex_text (
	id int unsigned not null auto_increment primary key,
	description text
) Engine = Innodb;

The field we want to query by proximity is description, therefore we create the following auxiliary table to store soundex values:

create table if not exists soundex_text_index (
	soundex_text_id int unsigned not null references soundex_text(id),
	soundex char(4)
) Engine = Innodb;

We now create a stored procedure and a trigger to populate the auxiliary table automatically every time a row is inserted in the main table.

-- adapted from example at http://forums.mysql.com/read.php?60,78776,242420#msg-242420
-- posted by jim smith: http://forums.mysql.com/profile.php?60,3154903
CREATE PROCEDURE update_soundex_text_index (sStringIn text,splitChar varchar(1), soundex_text_id int)
BEGIN
DECLARE comma INT DEFAULT 0;
DECLARE mylist TEXT DEFAULT sStringIn;
DECLARE temp TEXT DEFAULT '';
DECLARE strlen int DEFAULT LENGTH(sStringIn);
DECLARE insert_id int DEFAULT soundex_text_id;

/* find the first instance of the spliting character */
SET comma = LOCATE(splitChar,mylist);
/* Insert each split variable into the temp table */
WHILE strlen > 0 DO
	IF comma = 0 THEN
		SET temp = TRIM(mylist);
		SET mylist = '';
		SET strlen = 0;
	END IF;
	IF comma != 0 THEN
		SET temp = TRIM(SUBSTRING(mylist,1,comma-1));
		SET mylist = TRIM(SUBSTRING(mylist FROM comma+1));
		SET strlen = LENGTH(mylist);
		-- Sample handling of special chars you might want removed from individual words
		-- before storing their soundex.
		SELECT REPLACE(temp,',','') INTO temp;
		SELECT REPLACE(temp,';','') INTO temp;
		SELECT REPLACE(temp,':','') INTO temp;
	END IF;
	IF temp != '' THEN
		insert into soundex_text_index (soundex_text_id,soundex) values (insert_id,substring(soundex(temp) from 1 for 4));
	END IF;
	SET comma = LOCATE(splitChar,mylist);
END WHILE;

END//

drop trigger if exists soundex_text_bi//
create trigger soundex_text_bi
before insert
on soundex_text
for each row
begin
	SET @id = last_insert_id();
	call update_soundex_text_index (NEW.description, ' ', @id);
end;//
delimiter ;

Finally, a stored procedure to automatically query the table using the auxiliary table implicitly:

delimiter //
create procedure query_soundex_text(sStringIn text, splitChar varchar(1))
BEGIN
DECLARE comma INT DEFAULT 0;
DECLARE mylist TEXT DEFAULT sStringIn;
DECLARE temp TEXT DEFAULT '';
DECLARE strlen int DEFAULT LENGTH(sStringIn); 

create temporary table results (id int unsigned, description text);

/* find the first instance of the spliting character */
SET comma = LOCATE(splitChar,mylist);
/* Insert each split variable into the temp table */
WHILE strlen > 0 DO
	IF comma = 0 THEN
		SET temp = TRIM(mylist);
		SET mylist = '';
		SET strlen = 0;
	END IF;
	IF comma != 0 THEN
		SET temp = TRIM(SUBSTRING(mylist,1,comma-1));
		SET mylist = TRIM(SUBSTRING(mylist FROM comma+1));
		SET strlen = LENGTH(mylist);
		-- Sample handling of special chars you might want removed from individual words
		-- before storing their soundex.
		SELECT REPLACE(temp,',','') INTO temp;
		SELECT REPLACE(temp,';','') INTO temp;
		SELECT REPLACE(temp,':','') INTO temp;
	END IF;
	IF temp != '' THEN
		insert into results select st.id, st.description from soundex_text st, soundex_text_index sti where sti.soundex = substring(soundex(temp) from 1 for 4);
	END IF;
	SET comma = LOCATE(splitChar,mylist);
END WHILE;

select distinct * from results;
drop table results;

END//
delimiter ;

Using this is pretty straightforward.
Just insert some sample data into the table, and give it a shot!:

mysql> insert into soundex_text(description) values ('This is a sample text row');
Query OK, 1 row affected (0.00 sec)

mysql> call query_soundex_text('semple',' ');
+------+---------------------------+
| id   | description               |
+------+---------------------------+
|    3 | This is a sample text row |
+------+---------------------------+
1 row in set (0.00 sec)

Query OK, 0 rows affected (0.00 sec)

mysql> call query_soundex_text('THis is a samPLE taxt row',' ');
+------+---------------------------+
| id   | description               |
+------+---------------------------+
|    3 | This is a sample text row |
+------+---------------------------+
1 row in set (0.00 sec)

Query OK, 0 rows affected (0.00 sec)

mysql> call query_soundex_text('rock',' ');
Empty set (0.01 sec)

Query OK, 0 rows affected (0.01 sec)

To make it even easier, you could modify the querying stored procedure and always assume that the splitting character is a whitespace.

You can download a zipfile with a sql script with all this code for you to load into a MySQL database right here.

Enjoy!

Blog Logo

Fernando Ipar


Published

Image

Fernando Ipar

"Tell my wife I love her very much, she knows"

Back to Overview