November 9, 2005

I'm enthralled

First: Nerd alert!

I've been working on a problem today at work where certain spam emails get through our filters even though (judging by the content) they should have been blocked.

What I'm working with is what is called a "phishing" scam, where the scammer tries to trick someone into revealing personal information so that the scammer can commit identity theft. It's very common right now.

When you look at it in your browser or in an email client that renders the HTML (like Microsoft Outlook that we use) you see a normal message. But, when you look at it non-rendered, you see terrible misspelling like this:

Drae Mrebme,We muts cehck thta yoru ID was regiseretd by real peolpe. So, to hlep prevetn automadetregtsirations, plsaee clikc on thsi lkni and cpmolete coed verifoitacin prsecos: (link removed by me)
Tahnk you.

The source code looks like this:



I searched in Google and tinkered and experimented with this all afternoon trying to figure out what it was. My mistake was not noticing that there were only two different codes in all that junk: 8238 and 8236. If I would have noticed, I could have solved this a lot faster. But I didn't.

As it turns out, 8238 and 8236 are kind of like HTML open and close tags. They resemble a "B" and a "/B" with angle brackets < > around them - the code to start bolding in HTML and the code to end the boldface. BUT!!!!, these codes are to start and end reversing of the text. So the phrase "sremmaps diputs", if wrapped in those codes would render as "stupid spammers". It's really very ingenious. The filter can't figure out what that means because it is written to look for words that are in a predefined "badness" database. While it would certainly recognize the word spelled correctly, it would not recognize "sinep". It's amazing that this is part of standard browser rendering. I can only think of one reason that it would be in there - for rendering languages that read right to left like Japanese. It's amazing. I have to find a way to use this to my advantage.

I'm so thankful for this web page: http://www.pandasoftware.com/virus_info/spam/. If they hadn't had info on this, I would have never figured it out. My problem, mainly, was not knowing what to search for in Google to find my answer. But there it is. I've written a script to capture those kinds of emails now, but we'll see if it works...

Okay, Nerd-fest is over. Hehe.

Update: Incredible! My filter script works! I sent that text above from my hotmail to my work account, and it was trapped. When I released it and let it be delivered, it looked perfect. When I forwarded it, it looked messed up again. This is just wacky.

1 comment:

Rutherford said...

Nice work! I was enthralled as well, and I don't even know a thing about computers. Your explanation was very good. Now I'm wondering what the bad phrase(s) was that would be caught if it was not rendered reverse to forward. What does the filter look for in that message?