Things can be easier. It amazes how difficult simple text processing can be. The crazy thing is after 10+ years of doing it for the web, things don’t seem to be getting any easier. Copying and pasting text from Word always is a problem, plus now people like fancy quotes, accents and all sorts of other characters which makes one or more of HTML, PHP, Javascript or the database to barf.
So after seeing yet another ? instead of an apostrophe, I decided to hopefully rid myself of this problem. The initial issue is I didn’t give enough thought to what format things should be stored in the database. I naively thought, “ok, receive data, make it safe and put in database, everybody is happy.” In an ideal world this would work. The “make it safe” part is to avoid SQL insertion attacks and so the database doesn’t barf on apostrophes.
My solution now is to store everything in the database as HTML encoded text in basic ASCII characters. All those weird special characters, accents and junk gets converted to their HTML entities. It is then just a matter of converting to plain text when needed to display for e-mail or URLs and letting the browser handle the rest. Hopefully I’ve seen my last ? instead of apostrophes.
It’s encouraging to see that even the all mighty Google has the same issue at times; here’s a screen shot from my Google Homepage:

Post a Comment