bryant: (Default)
[personal profile] bryant
The original post which prompted this one is now inaccessible accessible again. C'est la vie.




Hopefully you weren't thinking of this as a news post, but just in case:

http://blog.seattlepi.com/amazon/archives/166329.asp

"This is an embarrassing and ham-fisted cataloging error for a company that prides itself on offering complete selection," says the Amazon spokesperson. It's not as complete an explanation as I wish we could have, but it is not any sort of attempt to save face.




Oooh, new update!

There is a "report this content as inappropriate" box at the bottom of at least some Amazon pages. It doesn't go to the URL claimed, though. It goes to http://www.amazon.com/gp/product/features/fiona-feedback-email.html.

It is absolutely fascinating that after I screwed up and completely missed that link, nobody double-checked me. See philosophical pontificates lower down in this post.

(And nah, I still don't buy the troll -- the URL inconsistency is still wacky, and it's too easy to say "sure, it's not there because the functionality was removed." Especially since it wasn't. I do think he was making a point about trust on the Internet, and I do still think it's a good one.)




Oh, OK, one pair of links:

http://mikedaisey.com/
http://www.lilithsaintcrow.com/journal/2009/04/idosyncratic-code-amazonfail/

Mike Daisey used to work at Amazon, so it's not impossible that he has contacts inside the company who'd tell him the truth. I don't know of any motivation he has to make shit up, although he does make his living from people buying tickets to his work. He is a friend of a friend who I trust, but I don't know him personally. The story is plausible but also lacks detail.

The above paragraph is the sort of analysis to which one ought to subject any of these random assertions.




There's a guy claiming he abused Amazon's reporting system in order to get GLBT books removed from the sales rank listings. Since I have some pretension to technical ability, I figured I'd give his claims a test run. (Edit: post now protected and unreadable, unless you're a community member.)

Summation: nope, you didn't do that, you liar you. Nice meta-troll, though.

Details:

a) The code is buggy; I can't get it to run as written. In particular, he uses the -dump parameter to links. That causes links to dump a formatted version of the document, which does not contain any URLs at all. (Edit: yeah, he gave Valleywag an explanation for this which does make sense.)

I went ahead and got a non-formatted version of the page he's grabbing for the sake of completeness, and ran his grep and sed statements on it. You don't actually get a pretty listing of product IDs from that. You could get one if you wrote better regexps, but the ones he's providing just don't work.

So let's say he was just, I don't know, obfuscating because he's lazy. It is entirely possible to get a list of Amazon product IDs by methods similar to the ones he posted. Onward!


Thought that was clear, but I don't mind making it crystal-clear: I'm convinced that you can get a list of Amazon product IDs using his code.

b) He says that URLs of the form http://www.amazon.com/ri/product-listing/ generate a complaint. However, if you go to a URL with that format, you get a 404 page. It's possible that Amazon just pulled that functionality this morning, but there's no sign of that URL in their help system that I can see after a quick once-over.

Edit: he is saying that the functionality was just pulled. So I dunno. Google caches of Amazon book pages don't show the link he's claiming was there, but they wouldn't if that functionality is dependent on being logged in. Anyone have an old screenshot of an Amazon page showing that?

Conclusion: troll! Reserving the right to change my mind if we get some real proof, but see lengthy philosophy note that follows.

Edit: it's an interesting bit of trolling, actually. He's piggybacking on some of the same tendencies that led the original story to turn into a Cause with a capital C. If you're not a geek, the Internet is just this weird magical place where stuff /happens/ and anything's likely if it's expressed with authority.

His post is even better because there's nothing inherently implausible about the idea of hiring a bunch of third-world sweatshop people to screw up a user-generated tagging/complaint system. Amazon doesn't appear to have made that mistake, but you need to check before you can be sure.

To make it even better, Twitter was the main vector of communication about the Amazon stuff. Twitter is lousy for any communication which takes more than 140 characters; it strips logic leaving us only with reputation capital. The #amazonfail tag got a lot of reputation capital, initially from upset people and later from sheer volume...

But you can't tell from a Twitter post whether or not something's authentic. You gotta do your own research and thinking. Some people do; lots of people don't. No matter what Amazon did or didn't do, intentionally or not, there is absolutely not enough evidence right now to draw any conclusions other than "it's bad that this happened." Our troll used the same transmission technique, because who's gonna take the time to read his post and think about his claims?

(Addition to that: seriously. Why do you believe him? Why do you believe me, for that matter?)

Good times. At some point we're going to have to figure out how to overcome a thousand years of conditioning: for a very long time, saying something loudly required a great deal of effort, so at least you knew someone really believed what they were saying. These days, no effort at all, but we still have that kneejerk reaction. (This is not an Internet problem. I blame LaserWriters.) Man, doesn't it just seem antiquated that Speaker's Corner used to be a huge deal and a symbol of free speech?

The really interesting thing about the troll is that he's right even if he didn't do it. The vulnerability he describes exists anywhere you make automated decisions based on third-party input.

Hah. See what happens when I get links? I pontificate.

Date: 2009-04-13 04:52 pm (UTC)
From: [identity profile] jonthegm.livejournal.com
The code he POSTED didn't work... but you can immediately see that given a slightly more powerful perl or python script that you could do it without links.

Date: 2009-04-13 05:07 pm (UTC)
From: [identity profile] jonthegm.livejournal.com
What about the tagging system? Couldn't that be the culprit as well? (This hinges on Amazon having a heuristic that says "adult" tag combined with some other tags == highly likely to need blacklisting... therefore blacklist and send for review)

It seems a bit easier to blame a Bantown style exploit the more you look at it. Even if this guy is just a fake.

Date: 2009-04-13 05:28 pm (UTC)
From: [identity profile] lynxreign.livejournal.com
I've read that part of the problem is that publishers can add tags and Amazon adds tags. The publisher's tags for many of these contained words that those that got through did not.

Date: 2009-04-13 05:38 pm (UTC)
From: [identity profile] lynxreign.livejournal.com
If I were a publisher and I didn't know that certain tags are being filtered out and especially if I don't know much about how computers work, I'd tag books with every tag I could think of that remotely has something to do with the book so it'd show up in more searches. And if I thought "Adult" meant "Not a kids or YA book" then I'd throw it around with wild abandon.

Date: 2009-04-13 05:46 pm (UTC)
From: [identity profile] lynxreign.livejournal.com
Yeah, I was thinking they currently have 2 problems and I'm really happy I'm not trying to hunt down either one. There's the tech problem, how was this done and how do we prevent it from happening again and there's the human problem, was this achieved by someone outside, someone inside, our policies and how do we prevent that from happening again. If it turns out it was an internal person acting technically within policy that's a royal mess to clean up, far harder than any code fixes.

Date: 2009-04-13 11:08 pm (UTC)
From: [identity profile] sylvanstargazer.livejournal.com
What I would do if I were slightly more bored is mine the tags, categories, publishers and so on from the stripped books and do a correlation to see where it came from. My initial suspicion was that categories are based on hidden tags, which is why they are the most visible shared features, and that it's something like, "if a book has at least two of the following tags it is flagged."

Then something like "sexuality" got added to the list, but it is used in ambiguous contexts by publishers to mean "sexual orientation" rather than "sexual nature", and it rapidly spiraled out of control. Of course, the real answer is to not censor book results, and certainly not by the "remove their Amazon ranking" hack. Really people, add the extra boolean and scatter if statements liberally. It may be ugly, but if they had done so it might never have been noticed.

Date: 2009-04-13 05:48 pm (UTC)
From: [identity profile] head58.livejournal.com
I'm guessing this is not a happy day at Amazon HQ for pretty much anybody.

Date: 2009-04-13 06:37 pm (UTC)
From: [identity profile] lynxreign.livejournal.com
Now that looks like it makes sense. From a "why were certain things filtered" sense anyway.

Date: 2009-04-13 08:00 pm (UTC)
From: [identity profile] nnaylime.livejournal.com
it could be that Amazon has adjusted things on their end already to prevent this sort of attack in the future?

Date: 2009-04-13 11:08 pm (UTC)
From: [identity profile] sylvanstargazer.livejournal.com
Not only that, but I had the initial thought of "you know, if such capability did exist, I would abuse it like so..." and it's pretty similar to what he threw together. I assume he had the same thought, tossed in some homophobia to play off the panic and figured he'd see how much attention he'd get.

Personally I just went and tagged Ann Coulter's books as "sexuality", "bestiality", "hard core porn", and "adult content". Apparently other people had the same idea, since today the sixth most popular tag for her latest book is "gay porn". If my beliefs are going to be tagged as adult content, I certainly want everyone I disagree with to come with me.

October 2025

S M T W T F S
    1234
567891011
12131415161718
19202122232425
2627 28293031 

Most Popular Tags

Style Credit

Expand Cut Tags

No cut tags
Page generated Jan. 14th, 2026 10:35 pm
Powered by Dreamwidth Studios