bryant | Amazon Troll Busting

The original post which prompted this one is now ~~inaccessible~~ accessible again. C'est la vie.

Hopefully you weren't thinking of this as a news post, but just in case:

http://blog.seattlepi.com/amazon/archives/166329.asp

"This is an embarrassing and ham-fisted cataloging error for a company that prides itself on offering complete selection," says the Amazon spokesperson. It's not as complete an explanation as I wish we could have, but it is not any sort of attempt to save face.

Oooh, new update!

There is a "report this content as inappropriate" box at the bottom of at least some Amazon pages. It doesn't go to the URL claimed, though. It goes to http://www.amazon.com/gp/product/features/fiona-feedback-email.html.

It is absolutely fascinating that after I screwed up and completely missed that link, nobody double-checked me. See philosophical pontificates lower down in this post.

(And nah, I still don't buy the troll -- the URL inconsistency is still wacky, and it's too easy to say "sure, it's not there because the functionality was removed." Especially since it wasn't. I do think he was making a point about trust on the Internet, and I do still think it's a good one.)

Oh, OK, one pair of links:

http://mikedaisey.com/
http://www.lilithsaintcrow.com/journal/2009/04/idosyncratic-code-amazonfail/

Mike Daisey used to work at Amazon, so it's not impossible that he has contacts inside the company who'd tell him the truth. I don't know of any motivation he has to make shit up, although he does make his living from people buying tickets to his work. He is a friend of a friend who I trust, but I don't know him personally. The story is plausible but also lacks detail.

The above paragraph is the sort of analysis to which one ought to subject any of these random assertions.

There's a guy claiming he abused Amazon's reporting system in order to get GLBT books removed from the sales rank listings. Since I have some pretension to technical ability, I figured I'd give his claims a test run. (Edit: post now protected and unreadable, unless you're a community member.)

Summation: nope, you didn't do that, you liar you. Nice meta-troll, though.

Details:

a) The code is buggy; I can't get it to run as written. In particular, he uses the -dump parameter to links. That causes links to dump a formatted version of the document, which does not contain any URLs at all. (Edit: yeah, he gave Valleywag an explanation for this which does make sense.)

I went ahead and got a non-formatted version of the page he's grabbing for the sake of completeness, and ran his grep and sed statements on it. You don't actually get a pretty listing of product IDs from that. You could get one if you wrote better regexps, but the ones he's providing just don't work.

So let's say he was just, I don't know, obfuscating because he's lazy. It is entirely possible to get a list of Amazon product IDs by methods similar to the ones he posted. Onward!

Thought that was clear, but I don't mind making it crystal-clear: I'm convinced that you can get a list of Amazon product IDs using his code.

b) He says that URLs of the form http://www.amazon.com/ri/product-listing/ generate a complaint. However, if you go to a URL with that format, you get a 404 page. It's possible that Amazon just pulled that functionality this morning, but there's no sign of that URL in their help system that I can see after a quick once-over.

Edit: he is saying that the functionality was just pulled. So I dunno. Google caches of Amazon book pages don't show the link he's claiming was there, but they wouldn't if that functionality is dependent on being logged in. Anyone have an old screenshot of an Amazon page showing that?

Conclusion: troll! Reserving the right to change my mind if we get some real proof, but see lengthy philosophy note that follows.

Edit: it's an interesting bit of trolling, actually. He's piggybacking on some of the same tendencies that led the original story to turn into a Cause with a capital C. If you're not a geek, the Internet is just this weird magical place where stuff /happens/ and anything's likely if it's expressed with authority.

His post is even better because there's nothing inherently implausible about the idea of hiring a bunch of third-world sweatshop people to screw up a user-generated tagging/complaint system. Amazon doesn't appear to have made that mistake, but you need to check before you can be sure.

To make it even better, Twitter was the main vector of communication about the Amazon stuff. Twitter is lousy for any communication which takes more than 140 characters; it strips logic leaving us only with reputation capital. The #amazonfail tag got a lot of reputation capital, initially from upset people and later from sheer volume...

But you can't tell from a Twitter post whether or not something's authentic. You gotta do your own research and thinking. Some people do; lots of people don't. No matter what Amazon did or didn't do, intentionally or not, there is absolutely not enough evidence right now to draw any conclusions other than "it's bad that this happened." Our troll used the same transmission technique, because who's gonna take the time to read his post and think about his claims?

(Addition to that: seriously. Why do you believe him? Why do you believe me, for that matter?)

Good times. At some point we're going to have to figure out how to overcome a thousand years of conditioning: for a very long time, saying something loudly required a great deal of effort, so at least you knew someone really believed what they were saying. These days, no effort at all, but we still have that kneejerk reaction. (This is not an Internet problem. I blame LaserWriters.) Man, doesn't it just seem antiquated that Speaker's Corner used to be a huge deal and a symbol of free speech?

The really interesting thing about the troll is that he's right even if he didn't do it. The vulnerability he describes exists anywhere you make automated decisions based on third-party input.

Hah. See what happens when I get links? I pontificate.

Page 1 of 7 << [1] [2] [3] [4] [5] [6] [7] >>

Threaded | Top-Level Comments Only

From: (Anonymous)

I rather thought so... thanks for checking, though: It's always better to know rather than assume.

Form a common sense standpoint, his story would not explain the two Amazon reps' response template about adult content.

Thank you.

From: (Anonymous)

Thanks for clearing this up. Even if this guy's actions had some kind of effect there's no way one person would be responsible of it all.

From: (Anonymous)

works for me.

merk@locke:~ [31/83]$ links --version
ELinks 0.10.6 (built on Sep 25 2007 18:50:54)

Features:
Standard, Fastmem, IPv6, gzip, bzip2, Cascading Style Sheets, Protocol (File, FTP, HTTP, NNTP, SMB, URI rewrite, User protocols),
SSL (GnuTLS), MIME (Option system, Mailcap, Mimetypes files), LED indicators,
Bookmarks, Cookies, Form History, Global History, Scripting (Lua, Perl)

From:

peitso.livejournal.com

The 'adult' content policy messages could have been a PR move because they really had no idea what was going on. No company ever tells the public that it's dealing with a massive crisis that hasn't surfaced yet--there's no need if it's not public. If that was the case, it was still a bad move.

From:

troubleinchina.livejournal.com

Also, it doesn't account for all of the books that were taken down. Amongst other things, straight erotica also went down, and that wouldn't go down with a "I'm targeting GLTB books" code.

From:

head58.livejournal.com

I'm trying to think of the last time I got an email response from customer service from anybody (never mind on a holiday weekend) that had any idea what it was talking about.

From:

elialshadowpine

He mentioned getting other people in on it for a focused attack. If they changed the parameters, that might explain it.

That the code doesn't work at all, though......

From: (Anonymous)

Re 4):

He does insert the product IDs after http://www.amazon.com/ri/product-listing/ like so:

http://www.amazon.com/ri/product-listing/0830823794

But still results in 404.

From:

jonthegm.livejournal.com

The code he POSTED didn't work... but you can immediately see that given a slightly more powerful perl or python script that you could do it without links.

From:

bryant

Yeah, I tried that and got the 404.

If someone can come up with evidence that it worked prior to today I'll recant, because turning off a Web page is exactly how I'd turn off that capability without having to do a code push. But I wanna see some evidence first.

From: (Anonymous)

Thanks for clearing this up. Even if this guy's actions had some kind of effect there's no way one person would be responsible of it all.

From:

gerrib.livejournal.com

Why the canned email/phone response?

Simple. Y'all think you're talking to amazon.com. You're not. You're talking to an outsourcing company in either central U.S. or Indonesia depending on the time of day. They have a list of responses that they are programmed to give out depending on the problem. Book or other material listed as offensive material flagged? Punt out the adult clause. Note the complaint, get you off the phone or be done with the email, move on to the next thing. They have 4.5 minutes to get you in, deal with you, and get you out.

On a weekend, esp. a religious holiday? I'm betting no one was in the home office to deal with this mess.

I'm inclined to believe this is an outside job(not speculating whom, not really interested), but amazon.com left themselves highly vulnerable to this kind of situation by automating and outsourcing far too much.

From: (Anonymous)

[i]because they really had no idea what was going on[/i]

Well, this seems unlikely since the first one dates back to February and it apparently took the author several emails back and forth as well as involving his agent to get an adult content response:

http://craigspoplife.blogspot.com/2009/04/is-amazon-homophobic.html

So old 'adult content' excuse was not a panicked answer while Amazon tried to figure out the problem. It is most likely a correct answer reflecting corporate policy.

Amazon's current backtracking and smoke-mirroring with 'just a glitch' stuff is however totally understandable considering the amount of outrage expressed all over Twitter etc.

From:

bryant

I do! Heck, I wrote that code in about five minutes while I was fooling around with this. I'm willing to believe that he screwed with it for some reason.

Buttt the automated complaint page doesn't exist, which is a bigger problem.

From:

bryant

Well, sure. links is a real program that really does go out and fetch pages from the Web.

But if you run links as:

links -dump

You get a formatted version of the page; you don't get the raw HTML, which is what you need if you want to extract the links.

I mean, it's a side point, because it's obviously possible to write a script that'll extract every product ID on Amazon, but the code's undeniably buggy as posted.

From:

stardragonca.livejournal.com

Of course, young Archimedes just made himself liable to a whole lot of grief.
Especially if Amazon decides they need a good financial scapegoat, and lookee who just volunteered!

From:

emeraldsedai.livejournal.com

Appreciate the knowledgeable counter-perspective. I did, in fact, read the post on

brutal_honesty, along with everything else that's been linked a zillion times on Twitter. (Honestly, the internet was FUN last night!), and though I can't vet the guy's code, my tentative conclusion matches yours: nothing I've read so far makes total sense of the facts.

From:

bryant

I really /want/ to know what happened, because running big Internet stuff happens to be my job. I am sad that I probably won't ever really know.

It's damned well a good lesson in the vulnerabilities of centralization, no matter what. Same applies to Google. Same applies to UPS and FedEx. Single points of failure, right?

From:

bryant

I tend to believe that there's some automated filtering going on vis a vis adult content, which is not the most horrendous thing I can think of. For one thing, if you filter adult content out of sales rank lists, you avoid a whole different set of trolls.

(Fun for everyone: do a small-scale organized "everyone on 4chan spends 10 bucks" trick which pumps something hardcore pornographic to the top of the sales ranks for a day.)

The real problem here is whatever mechanism got inappropriate books tagged as adult. That's probably where the glitch is.

From:

jonthegm.livejournal.com

What about the tagging system? Couldn't that be the culprit as well? (This hinges on Amazon having a heuristic that says "adult" tag combined with some other tags == highly likely to need blacklisting... therefore blacklist and send for review)

It seems a bit easier to blame a Bantown style exploit the more you look at it. Even if this guy is just a fake.

From:

sunfell.livejournal.com

Could the troll have used Amazon's "Mechanical Turk" to propogate his scheme? We'll have to keep an eye on that to see if there will be any policy changes there. Talk about being blown up with one's own petard...

I like Amazon- warts-and-all, it's my fall back when local places can't or won't stock books I need. I'd hate to see it messed up by something like this. They're out to make money- it doesn't make sense that an internal policy could become so toxic.

From: (Anonymous)

Same applies to the A3 Network. Hellooo Twitter. ;)

From: (Anonymous)

Excellent point about the advisability of some adult filtering: and yes you are right, it's obviously the fine-tuning the parameters which is the underlying problem here. The details of the filtering being based on the publishers' meta data which explains inconsistencies between editions of the same books makes that clear.

But Amazon management seems to have forgotten or never known how fast an internet minute it. By now, amazonFail is centuries old for most of us. And damage control is nearly more important than the technical fix, IMO.

Personally I'd rather they have added the word 'Twilight' to their filters because the Twilight-related in your face advertising all over Amazon.com is seriously getting on my nerves... :D

From:

tacky-tramp.livejournal.com

I was hoping a coder would go over that! I continue to suspend judgment until Amazon comes forward with more information. I assume they're planning to explain what went down and how they'll prevent it from going down in the future.

From:

bryant

Oh, I will not at all be surprised if it turns out to be a third party responsible. Users can absolutely tag items, and if there's something turning those tags into decisions about which books should have sales rank? That'd explain it.

The tag in question would have to be something fairly innocuous, like "lesbian," given that Heather Has Two Mommies was de-ranked. It didn't have any adult tags, etc. -- lesbian was the one tag I could see which could be interpreted by someone dumb as adult content. So I dunno. Maybe.

But at this point there's nothing conclusive -- the post in question has nothing that would prove that the author was responsible. This puts it back in the pure speculation category. I'm holding off on pretending I know what happened until there's something real to bet on. Maybe Amazon screwed up; maybe Amazon was malicious; maybe someone else was malicious. Lord knows.

Edited Date: 2009-04-13 05:16 pm (UTC)