Replacing Bad Microsoft Characters in MySQL

Apr 20, 2010

If you've ever pasted a word document into a browser's text area and then insert it into a MySQL table, you've probably encountered some funky characters replace the quotes, dashes and other special characters.

This is due to Microsoft using extended character sets over ASCII 128.

In the past I'd have to manually search and replace the bad text. Today I decided to find a better solution. Searching on Google, I came across the following blog post
Replacing smart quotes, em-dashes, and ellipses with MySQL or PHP

This was exactly what I needed.

I went further to create a MySQL function. This way I could use it to convert the text before inserting it into the database.

Here's the function if you want to use it....

DELIMITER $$

DROP FUNCTION IF EXISTS `f_ReplaceMicrosoftCharacters`$$

CREATE FUNCTION `f_ReplaceMicrosoftCharacters`(mystring TEXT)
RETURNS TEXT
DETERMINISTIC
BEGIN

DECLARE myoutstring TEXT DEFAULT ';

SET myoutstring = ';

SET myoutstring = REPLACE(mystring, 0xE28098, "'");
SET myoutstring = REPLACE(myoutstring, 0xE28099, "'");
SET myoutstring = REPLACE(myoutstring, 0xE2809C, '"');
SET myoutstring = REPLACE(myoutstring, 0xE2809D, '"');
SET myoutstring = REPLACE(myoutstring, 0xE28093, '-');
SET myoutstring = REPLACE(myoutstring, 0xE28094, '--');
SET myoutstring = REPLACE(myoutstring, 0xE280A6, '...');

/*
* Next, replace their Windows-1252 equivalents.
*/

SET myoutstring = REPLACE(myoutstring, char(145), "'");
SET myoutstring = REPLACE(myoutstring, char(146), "'");
SET myoutstring = REPLACE(myoutstring, char(147), '"');
SET myoutstring = REPLACE(myoutstring, char(148), '"');
SET myoutstring = REPLACE(myoutstring, char(150), '-');
SET myoutstring = REPLACE(myoutstring, char(151), '--');
SET myoutstring = REPLACE(myoutstring, char(133), '...');

RETURN myoutstring;

END$$

DELIMITER ;

Comments

New Comment