|
The search for a future proofed anti spam
spider encoding solution
Have you ever put your email address up on your website and seen all the spam
suddenly flood in and wonder what happened? Plain text email addresses on websites were rated
as the main attractor of spam in a recent study. The reason being that the
spammers employ certain programs called "bots" or "spiders" which they just
point at any random web page and which then follow all the links, eventually
traversing the whole web and sucking up any email addresses they find as they go
along.
Many disgruntled folks have tried to find a solution. The bad news is it's
not easy; no solution is perfect. Search for "obfuscate email address" or "email address spider"
on your favourite search engine and you will
get a wealth of information and ideas for spider proofing, or "obfuscating",
your email address on your website.
...or ...well there's so much material and so much of it delves depressingly into the pros as
well as the many cons of each method that you'll most likely want to just close all
those windows and go into your HTML editor and simply replace all your email
links with "me at mydomain dot com" and then get on with your life!!! ;)
The problem is that all the "solutions" presented appear to be focused on one
method. What is required is a one-stop resource that clearly outlines the pros
and cons of each method and then presents a hybrid solution.
As far as I can see there's no spot on the web that proposes or outlines such a solution. So here goes:
Use javascript
I'll start off with the oldest solution out there. The good old var1="mydomain";
var2="me" ,... address = var1 + var2 + var...
| Pros |
Cons |
- Easy
- Offers a degree of protection from close to 100% of the spiders out there.
|
- Relies on javascript.
- If the browser has javascript turned off nothing will be displayed.
- Spam spiders under development will end up being able to read this kind of javascript at
some point in the future.
|
I don't like this solution because the same principle that prevents spiders being
to see your address will prevent a certain portion of your visitors from being
able to see your address. The space where your email address should be will just
be a blank.
Use an image
The second most common solution is to use an image like this:

| Pros |
Cons |
- Easy
- Surely spam spiders will never start looking for images and reading them?
|
- Can mess up your formatting. If the visitor changes text
size then the "text" in the image will of course not change.
- Not much good if visitors have images turned off.
- How about the visually impaired and those using text only browsers?
|
Remember you've still got the issue of how to encode the mailto: link.
Using unicode to represent common ASCII text
Now we're getting into the realms of the more effective solutions... :)
Simply
replace "me@mydomain.com" with
"me@mydomain.
com". Your browser can interpret this, your visitors can read it,
spiders can't. We're all happy campers...
| Pros |
Cons |
- A simple solution with no dependencies.
- Text and links should work as intended with all browsers no matter how
they're configured.
- Simple to implement, just use one of the web based tools out there and copy
and paste.
- In technical terms this is a relatively low tech solution.
|
- In technical terms this is a relatively low
tech solution.
|
You can even encode the "mailto:" part of the link in your HTML. As the
spiders simply load the body of your page and scan for "mailto" in clear text then this is
really going to can the problem once and for all? Well not really, this is an
arms race, unicode is no exclusive formula. All those spiders have to do is perform
two scans; one for "mailto" and one for "mailto"
and they're back in business again...
The Solution...
The third solution is clearly the best. If you use this solution you can rest
assured that you're protected against 99% of the spam spiders. Well today anyway
(March 2004) ...and for the next few months. (Remember if you're exposed to just
one spambag you're exposed to them all because they sell their address lists on
...so it's truly a game of chance really.)
We really need a more future-proofed solution. The solution I will
outline below will be based on the unicode model but will incorporate
elements of the javascript and image methods ...while addressing the
downsides of each.
Full hybrid spam spider solution
1 All links will be written in unicode.
To show how it's done. Here's a standard HTML mailto link:
<a href="mailto:me@nospam.nydomain.com">me@nospam.nydomain.com</a>
now the equivalent of "mailto:" in unicode is:
mailto:
"me@mydomain.com" is:
me@mydomain.
com
replace both and you get:
<a href=mailto:me@n
ospam.mydoma
in.com>me@nos
pam.mydomain
.com
</a>
Any modern browser will understand all this and display it perfectly legibly.
Try pasting the above in any HTML page.
With current technologies that is obfuscation. Done and dusted. Log off and
go for a beer. But wait! In 2005 terms that is clear text! So let's mess it up
and make it unreadable again...
2 We'll change the ascii text to
me@nospam.mydomain.com
...just like folks used to do on the Usenet!
(I pity whoever owns mydomain.com; he/she must be getting an awful lot of junk
mail...)
Here's the HTML:
<a href=mailto:me@n
ospam.mydoma
in.com>me@nos
pam.mydomain
.com
</a>
Now it's an unfortunate fact of life that most folks (I'm talking about the
average "netizen" here, ie customers!, not the net-heads that come
to read this...) will click on that and
glance at the address and not see that anything is awry and innocently click the
send button. Even if we go further
and munge it to me@nospam.mydomain.com.nowhere ...
The solution is to have some javascript that fires on page load that searches for all mailto
links and removes occurrences of "nospam" and "nowhere". (unless your domain name
has "nowhere" in it; then you'll have to substitute something else...)
The key assumption here being that spiders don't actually "run" the page, they
just read the raw text.
To accomplish this you just need to add your javascript routine to the <body> tag's onload event,
something like:
<body onLoad="decode_emails()">
"Ah but you said some users have javascript switched off! It's not going to
work!" ...Yes but if javascript is switched off at least there'll still be
something there. And for this solution I'm working on the assumption that
the people who have javascript turned off will be members of the "clued in"
crowd; the exact same subset of the online population who when they come across an address
like "me@nospam.mydomain.com", see "ah, a clever anti-spam
tactic!".
Anyway I'm even going to try to help those who have their javacript turned
off. And at the same time remove one element of plain text from the prying eyes
of a super intelligent spider...
3 Let's use an image!
Instead of the plain text I'll insert an image and put the mailto hyperlink
around it.
Here's the image.
Here's the HTML
<a href=mailto: me@m
ydomain.com>
<img border="0"
src="images/memydomain.png"
width="136" height="17"></a>
Remember the constraints here. It's probably best to differentiate this image
from the rest of the text, possibly by using a different font, possibly by
putting it on a separate line, in order that if the visitor should change the
text size or, heaven forbid, use their own cascading style sheet, the image won't
look so misplaced.
"But some users have images turned off!!!" ...Then they'll be quite used to
seeing pages full of
and I doubt
they'll hold it against you. But no fear, we'll help them by placing an
alternative text tag in the HTML*.
Alternative text is accomplished by adding an "alt='alt text goes here'" to
you image definition.
So they'll see
- almost
as good as the real thing isn't it? This will also be interpreted properly by
browsers for the visually impaired and by older browsers such as Lnyx. Be sure
to encode it!
[You might want to modify your javascript decoder routine so you can
munge this part as well, but then we'd be blurring that tenuous connection
between the requirements and the solution ever so slightly - remember the
purpose of the image is to show your (de-munged) email address for people with
scripting turned off; the purpose of the alt text is to show your email address
for people with images (and presumably scripting as well?) turned off . Lost?
It's your decision... Being a paranoid show-off I like to munge the whole
lot!...]
Your image HTML will end up looking something like this:
<img border="0"
src="images/memydomain.png"
width="136" height="17"
alt="me@nospam.my
domain.com.n
owhere">
* Yes we're back-stepping here; we're adding one element of "plain" text back
into the HTML. We'll try not to lose too much sleep over it...
Conclusion
And that's it. We've got ourselves a fairly future proofed anti spider solution. The 90%
of users who have both javascript and images turned on will see absolutely
nothing amiss. The 10% or so who have javascript turned off will see the right
representation thanks to the image ...and they'll most likely be able to deduce that the
address has been munged and backspace over a few characters won't they? They've
got javascript and images turned off? Well they'll be definitely used to
hardship, but the address will still be clearly visible for them. The main thing
is we're combining safety and we're catering to the lowest common denominator.
And nobody's left out of the party.
The quest doesn't end here, pretty soon the spam spiders are going to start
reading and interpreting javascript. It would be simple to throw a few lines of
code around the Mozilla ActiveX control to produce just such a tool. The next
step is going to have to employ a mixture of server side and client side code,
possibly involving a challenge / response type of setup. At some stage we might
have to throw out the javascript and keep the munged address form. Be sure to
watch this space to see how things develop.
And if you're still reading... Be sure to visit the toolz page to help you put all this together ...or if
you're reading this one year hence and the whole world plus dog has thrown away
their email account due to all the associated problems and risks then there's
this solution. Belt and braces being the watchword
around here!
And in the next article I'll write about honeypots and tarpits.
Copyright (C) Toolz, 2004
|