In my previous two posts, I wrote about a web comment spam application that has been hitting one of my personal web sites. I set up a bit of an agility test for the bots to figure out what they were, and were not, capable of doing. Eventually I was able to put together a fairly good operational profile of one of the spam application's functionality. In this post, I want to review the operation of another spam application that I happen to encounter during my test...one that seems far more capable than the original one I aimed to report on.

To briefly recap my previous two articles, there was one particular application that repeatedly kept visiting my site. It should send a bunch of bogus links in various different formats as a comment in a web form; since the links were entirely random, I hypothesized that these submissions were mere probes to some other future means and not the overall end purpose. Anyways, the operational profile that I constructed for the application looks like:

It recognized a few form field names and attempted to submit appropriately formatted data to them, but all other fields were filled with random garbage (unless an existing form field already held a value); specifically:
- It put an email address in the 'email' field, but not the 'eml' field
- It put a URL in the 'url' field, but not the 'link' field
- It did nothing special with the name, address, and phone fields

It supported cookies during the submission process

It did not support Javascript

Hidden form fields were properly submitted/included

It could support multi-step submissions

The User-Agent seems configurable, but is mostly left as the same value

Many uses of this application against my site have originated from the same Class C public Internet network

During my experiment, another spam application wandered onto my site and partook in my little agility test. This particular application stood out exceptionally different than the previously profiled application. Here’s an example of what the application actually sent (which can be compared against the raw data contained in my previous post). I’ve modified the domain names; I assure you the originals used live/real URLs.

eml: [email protected]

email: [email protected]

name: DoorieHoohona

phone: 123456

address: http://xxx.yyy.com/Avalide/map.html

url: http://xxx.yyy.com/Bust-Enhancer/map.html

link: http://xxx.yyy.com/L-Glutamine/map.html

comment: Wellbutrin XL what is celebrex clonidine medicine bupropion and weight loss <a href=http://xxx.yyy.com/Female-Libido-Patch/new-scientist.html>new scientist</a> <a href=http://xxx.yyy.com/Evegen/evegen-reviews.html>evegen reviews</a>... [truncated for brevity]

Just looking at the values submitted in these fields, there is a night-and-day difference when compared to the previously mentioned spam application I was tracking. This new application was significantly more successful at putting contextually-correct information in the right fields: email addresses were submitted for both 'email' and 'eml' fields; something that resembled an actual human name was submitted in 'name' field; the 'phone' field was numeric; both 'link' and 'url' fields held URLs. The value of the 'address' field is debatable...perhaps the application is coded to believe 'address' is akin to a web site address, i.e. URL. Or, maybe this particular app shoves URLs into fields it does not recognize (and thus the URL values in the 'address', 'url', and 'link' fields were actually just dumb luck). The links submitted within the comment only used one format (proper HTML <A> tag), so it's not as robust as the other application in abusing web applications that allow the use of popular forum code markup (i.e. the [url] and [link] pseudo-tags). But overall, the level of contextual awareness of this application is far more interesting than the previously profiled spam application.

So the current operational profile of this spam application is:

It has shown to be very successful at putting the right contextual/formatted information into a variety of different form fields; specifically:
- It put email addresses into the 'email' and 'eml' fields
- It put a human name into the 'name' field
- It put a numeric number into the 'phone' field
- It put URLs into the 'address', 'url', and 'link' fields
It supported cookies during the submission process

It did not support Javascript

Hidden form fields were properly submitted/included

The application does not appear to support multi-step submissions; or at least, it didn't care about verifying that the submission worked

The User-Agent string submitted is extremely easy to spot: "Mozilla/4.0 (compatible; MSIE 6.0; Update a; AOL 6.0; Windows 98)"

Unfortunately this particular application only visited my site once, so I don't have multiple submissions at hand to aggregate into a more comprehensive profile. I'll sure be on the lookout for the next time it comes back around.

Until then,
- Jeff