If you've ever pasted a word document into a browser's text area and then insert it into a MySQL table, you've probably encountered some funky characters replace the quotes, dashes and other special characters.
This is due to Microsoft using extended character sets over ASCII 128.
In the past I'd have to manually search and replace the bad text. Today I decided to find a better solution. Searching on
Google, I came across the following blog post
Replacing smart quotes, em-dashes, and ellipses with MySQL or PHP
This was exactly what I needed.
I went further to create a MySQL function. This way I could use it to convert the text before inserting it into the database.
Here's the function if you want to use it....
DELIMITER $$
DROP FUNCTION IF EXISTS `f_ReplaceMicrosoftCharacters`$$
CREATE FUNCTION `f_ReplaceMicrosoftCharacters`(mystring TEXT)
RETURNS TEXT
DETERMINISTIC
BEGIN
DECLARE myoutstring TEXT DEFAULT ';
SET myoutstring = ';
SET myoutstring = REPLACE(mystring, 0xE28098, "'");
SET myoutstring = REPLACE(myoutstring, 0xE28099, "'");
SET myoutstring = REPLACE(myoutstring, 0xE2809C, '"');
SET myoutstring = REPLACE(myoutstring, 0xE2809D, '"');
SET myoutstring = REPLACE(myoutstring, 0xE28093, '-');
SET myoutstring = REPLACE(myoutstring, 0xE28094, '--');
SET myoutstring = REPLACE(myoutstring, 0xE280A6, '...');
/*
* Next, replace their Windows-1252 equivalents.
*/
SET myoutstring = REPLACE(myoutstring, char(145), "'");
SET myoutstring = REPLACE(myoutstring, char(146), "'");
SET myoutstring = REPLACE(myoutstring, char(147), '"');
SET myoutstring = REPLACE(myoutstring, char(148), '"');
SET myoutstring = REPLACE(myoutstring, char(150), '-');
SET myoutstring = REPLACE(myoutstring, char(151), '--');
SET myoutstring = REPLACE(myoutstring, char(133), '...');
RETURN myoutstring;
END$$
DELIMITER ;
Comments