You are in: United States 

 Welcome to Toolz4Schoolz
 

Home    Site Map    Contact Me

» Home Page

» Photos

» Misc

» Lyrics

» Articles

» Obfuscating your email address
» Analysis of a "Slashdotting"
» Dai Hoang Long Hotel
» Pham Ngu Lao Low Life

» Links

» Toolz

 

my playlist right now...

  • Vietnamese - Nguoi ban than ten buon
  • Wedding Present - No
  • Wedding Present - Why Are You Being So Reasonable Now?
  • Wedding Present - Everyone Thinks He Looks Daft
  • Wedding Present - What Have I Said Now

    today in history...

  • 1649 - Treaty of Rueil destroys 1st Fronde-uprising
  • 1919 - General strike in Germany, crushed
  • 1943 - Nazi Militia forms in Netherlands
  • 1978 - Terrorists attack mail truck at Tel Aviv, 45 killed
  • 1990 - 16th People's Choice Awards

    and birthdays...

  • 1819 - Henry Tate, English sugar producer (Tate Gallery)
  • 1897 - Henry Dixon Cowell, Menlo Park Calif, composer (New Musical Resources)
  • 1929 - Francisco Bernardo Pulgar Vidal, composer
  • 1945 - Harvey Mandel, rock guitarist (Drei Amerikanische LP's)
  • 1969 - Dan Lacroix, Montreal, NHL left wing (NY Rangers)

     

  • The search for a future proofed anti spam spider encoding solution

    Have you ever put your email address up on your website and seen all the spam suddenly flood in and wonder what happened? Plain text email addresses on websites were rated as the main attractor of spam in a recent study. The reason being that the spammers employ certain programs called "bots" or "spiders" which they just point at any random web page and which then follow all the links, eventually traversing the whole web and sucking up any email addresses they find as they go along.

    Many disgruntled folks have tried to find a solution. The bad news is it's not easy; no solution is perfect. Search for "obfuscate email address" or "email address spider" on your favourite search engine and you will get a wealth of information and ideas for spider proofing, or "obfuscating", your email address on your website.

    ...or ...well there's so much material and so much of it delves depressingly into the pros as well as the many cons of each method that you'll most likely want to just close all those windows and go into your HTML editor and simply replace all your email links with "me at mydomain dot com" and then get on with your life!!! ;)

    The problem is that all the "solutions" presented appear to be focused on one method. What is required is a one-stop resource that clearly outlines the pros and cons of each method and then presents a hybrid solution.

    As far as I can see there's no spot on the web that proposes or outlines such a solution. So here goes:

    Use javascript

    I'll start off with the oldest solution out there. The good old var1="mydomain"; var2="me" ,... address = var1 + var2 + var...

    Pros Cons
    • Easy
    • Offers a degree of protection from close to 100% of the spiders out there.
    • Relies on javascript.
    • If the browser has javascript turned off nothing will be displayed.
    • Spam spiders under development will end up being able to read this kind of javascript at some point in the future.

    I don't like this solution because the same principle that prevents spiders being to see your address will prevent a certain portion of your visitors from being able to see your address. The space where your email address should be will just be a blank.

    Use an image

    The second most common solution is to use an image like this:

    Pros Cons
    • Easy
    • Surely spam spiders will never start looking for images and reading them?
    • Can mess up your formatting. If the visitor changes text size then the "text" in the image will of course not change.
    • Not much good if visitors have images turned off.
    • How about the visually impaired and those using text only browsers?

    Remember you've still got the issue of how to encode the mailto: link.

    Using unicode to represent common ASCII text

    Now we're getting into the realms of the more effective solutions... :)

    Simply replace "me@mydomain.com" with "me@mydomain.
    com". Your browser can interpret this, your visitors can read it, spiders can't. We're all happy campers...

    Pros Cons
    • A simple solution with no dependencies.
    • Text and links should work as intended with all browsers no matter how they're configured.
    • Simple to implement, just use one of the web based tools out there and copy and paste.
    • In technical terms this is a relatively low tech solution.
    • In technical terms this is a relatively low tech solution.
       

    You can even encode the "mailto:" part of the link in your HTML. As the spiders simply load the body of your page and scan for "mailto" in clear text then this is really going to can the problem once and for all? Well not really, this is an arms race, unicode is no exclusive formula. All those spiders have to do is perform two scans; one for "mailto" and one for "mailto" and they're back in business again...

    The Solution...

    The third solution is clearly the best. If you use this solution you can rest assured that you're protected against 99% of the spam spiders. Well today anyway (March 2004) ...and for the next few months. (Remember if you're exposed to just one spambag you're exposed to them all because they sell their address lists on ...so it's truly a game of chance really.)

    We really need a more future-proofed solution. The solution I will outline below will be based on the unicode model but will incorporate elements of the javascript and image methods ...while addressing the downsides of each.

    Full hybrid spam spider solution

    1 All links will be written in unicode.

    To show how it's done. Here's a standard HTML mailto link:

    <a href="mailto:me@nospam.nydomain.com">me@nospam.nydomain.com</a>

    now the equivalent of "mailto:" in unicode is: &#109;&#97;&#105;&#108;&#116;&#111;&#58;

    "me@mydomain.com" is: &#109;&#101;&#64;&#109;&#121;&#100;&#111;&#109;&#97;&#105;&#110;&#46;
    &#99;&#111;&#109;

    replace both and you get:

    <a href=&#109;&#97;&#105;&#108;&#116;&#111;&#58;&#109;&#101;&#64;&#110;
    &#111;&#115;&#112;&#97;&#109;&#46;&#109;&#121;&#100;&#111;&#109;&#97;
    &#105;&#110;&#46;&#99;&#111;&#109;>&#109;&#101;&#64;&#110;&#111;&#115;
    &#112;&#97;&#109;&#46;&#109;&#121;&#100;&#111;&#109;&#97;&#105;&#110;
    &#46;&#99;&#111;&#109;
    </a>

    Any modern browser will understand all this and display it perfectly legibly. Try pasting the above in any HTML page.

    With current technologies that is obfuscation. Done and dusted. Log off and go for a beer. But wait! In 2005 terms that is clear text! So let's mess it up and make it unreadable again...

    2 We'll change the ascii text to me@nospam.mydomain.com

    ...just like folks used to do on the Usenet!

    (I pity whoever owns mydomain.com; he/she must be getting an awful lot of junk mail...)

    Here's the HTML:

    <a href=&#109;&#97;&#105;&#108;&#116;&#111;&#58;&#109;&#101;&#64;&#110;
    &#111;&#115;&#112;&#97;&#109;&#46;&#109;&#121;&#100;&#111;&#109;&#97;
    &#105;&#110;&#46;&#99;&#111;&#109;>&#109;&#101;&#64;&#110;&#111;&#115;
    &#112;&#97;&#109;&#46;&#109;&#121;&#100;&#111;&#109;&#97;&#105;&#110;
    &#46;&#99;&#111;&#109;
    </a>

    Now it's an unfortunate fact of life that most folks (I'm talking about the average "netizen" here, ie customers!, not the net-heads that come to read this...) will click on that and glance at the address and not see that anything is awry and innocently click the send button. Even if we go further and munge it to me@nospam.mydomain.com.nowhere ...

    The solution is to have some javascript that fires on page load that searches for all mailto links and removes occurrences of "nospam" and "nowhere". (unless your domain name has "nowhere" in it; then you'll have to substitute something else...) The key assumption here being that spiders don't actually "run" the page, they just read the raw text.

    To accomplish this you just need to add your javascript routine to the <body> tag's onload event, something like:

    <body onLoad="decode_emails()">

    "Ah but you said some users have javascript switched off! It's not going to work!" ...Yes but if javascript is switched off at least there'll still be something there. And for this solution I'm working on the assumption that the people who have javascript turned off will be members of the "clued in" crowd; the exact same subset of the online population who when they come across an address like "me@nospam.mydomain.com", see "ah, a clever anti-spam tactic!".

    Anyway I'm even going to try to help those who have their javacript turned off. And at the same time remove one element of plain text from the prying eyes of a super intelligent spider...

    3 Let's use an image!

    Instead of the plain text I'll insert an image and put the mailto hyperlink around it.

    Here's the image.

    Here's the HTML

    <a href=&#109;&#97;&#105;&#108;&#116;&#111;&#58;&#32;&#109;&#101;&#64;&#109;
    &#121;&#100;&#111;&#109;&#97;&#105;&#110;&#46;&#99;&#111;&#109;>
    <img border="0" src="images/memydomain.png" width="136" height="17"></a>

    Remember the constraints here. It's probably best to differentiate this image from the rest of the text, possibly by using a different font, possibly by putting it on a separate line, in order that if the visitor should change the text size or, heaven forbid, use their own cascading style sheet, the image won't look so misplaced.

    "But some users have images turned off!!!" ...Then they'll be quite used to seeing pages full of and I doubt they'll hold it against you. But no fear, we'll help them by placing an alternative text tag in the HTML*.

    Alternative text is accomplished by adding an "alt='alt text goes here'" to you image definition.

    So they'll see - almost as good as the real thing isn't it? This will also be interpreted properly by browsers for the visually impaired and by older browsers such as Lnyx. Be sure to encode it!

    [You might want to modify your javascript decoder routine so you can munge this part as well, but then we'd be blurring that tenuous connection between the requirements and the solution ever so slightly - remember the purpose of the image is to show your (de-munged) email address for people with scripting turned off; the purpose of the alt text is to show your email address for people with images (and presumably scripting as well?) turned off . Lost? It's your decision... Being a paranoid show-off I like to munge the whole lot!...]

    Your image HTML will end up looking something like this:

    <img border="0" src="images/memydomain.png" width="136" height="17" alt="&#109;&#101;&#64;&#110;&#111;&#115;&#112;&#97;&#109;&#46;&#109;&#121;
    &#100;&#111;&#109;&#97;&#105;&#110;&#46;&#99;&#111;&#109;&#46;&#110;
    &#111;&#119;&#104;&#101;&#114;&#101;">

    * Yes we're back-stepping here; we're adding one element of "plain" text back into the HTML. We'll try not to lose too much sleep over it...

    Conclusion

    And that's it. We've got ourselves a fairly future proofed anti spider solution. The 90% of users who have both javascript and images turned on will see absolutely nothing amiss. The 10% or so who have javascript turned off will see the right representation thanks to the image ...and they'll most likely be able to deduce that the address has been munged and backspace over a few characters won't they? They've got javascript and images turned off? Well they'll be definitely used to hardship, but the address will still be clearly visible for them. The main thing is we're combining safety and we're catering to the lowest common denominator. And nobody's left out of the party.

    The quest doesn't end here, pretty soon the spam spiders are going to start reading and interpreting javascript. It would be simple to throw a few lines of code around the Mozilla ActiveX control to produce just such a tool. The next step is going to have to employ a mixture of server side and client side code, possibly involving a challenge / response type of setup. At some stage we might have to throw out the javascript and keep the munged address form. Be sure to watch this space to see how things develop.

    And if you're still reading... Be sure to visit the toolz page to help you put all this together ...or if you're reading this one year hence and the whole world plus dog has thrown away their email account due to all the associated problems and risks then there's this solution. Belt and braces being the watchword around here!

    And in the next article I'll write about honeypots and tarpits.

     

    Copyright (C) Toolz, 2004

     

     

    absolutely compatible with any browser. best if you have your screen at a resolution greater than 800x600
    turn off your javascript, your activex, flash, even your cookies...