I guess I feel more comfortable commenting on this vs such posts about Google (my employer) because I don't have to worry about leaking anything important but this is absolutely not a piece of cake.
Anyone who has worked on planet scale anti-abuse systems knows this is a very tough and never ending problem.
You think there are obvious signals for good or bad - the 0.01% cases where this does not hold up turns to 20 million daily mistakes if your product has 2B DAUs. You think you built something that works, spammers need to figure out just one loophole to game the system. Sometimes they don't even have to look at a technical loophole, if the economics works out you can even pay normal users small sums of money and the bad activity is now masked by troves of genuine user activity.
It is a little painful to see how HN loves to vilify people working on these issues, but its a little like saying we have police but the crime is not 0 yet. And if the argument is it happens too often that is actually most likely not true - a 99.999% accurate system will make a mistake 600k times every month for 2B DAU activity assuming one user interaction a day for the product. Which is often a big underestimation for many large products.
I do think like any other area there is a lot we can do better and that we are already working on it. But man it sucks to have spent 6 months working 12 hour days just to see someone making grandiose statements on how something is an easy issue to fix when you know its absolutely false.
I guess its fine though, FANG pays well and I enjoy my work and think its a net positive to the society.
This is a bit like arguing that Medicare for all is impossible in the United States because it is such a big country.
This is a straw man argument, because only relative terms matter. Nobody cares about the absolute numbers except your boss. The customers care about their percentage chance of encountering spam and fake reviews.
You're saying it's too hard to achieve 100% success (0% failure rate), and that we should settle for 99.999% success (0.001% failure rate). Multiplying these by big numbers is irrelevant. It could be a trillion. Who cares? The implication is that the system is nearly perfect.
Meanwhile the experience most online shopping users have is more like a 80% success and a 20% failure rate at best, and often more like a 99% failure rate and a 1% success rate for regular shoppers. I have personally long since given up on every buying anything from EBay or Amazon because of the rampant fakery. Literally everything has a thousand AAAA++++++ reviews that all are obviously generated via a template.
The same argument applies for Medicare: The citizens don't care about the absolute budget, they just care about their individual tax increase or decrease. Only a handful of people in the treasury care about the absolute numbers.
You've fallen into a common statistical trap. If we assume the volume of auto-generated spam is significantly (several orders of magnitude) higher than the volume of legitimate reviews (which seems to be the case, and would make sense), having a 99.999% success rate doesn't mean that users will observe 99.999% legitimate reviews.
The equivalent of "medicare for all is impossible" would be that an approach saying "anti spam is impossible so we wont do anything". That is not the case. While I do not know about amazon reviews I would bet its a fairly likely case that they do have teams trying to fight this spam.
The equivalent in your example would be saying we have medicare for all but for some people the system does not work. Thats the state of the world we are in, we are making efforts but they will never be 100% perfect.
>> Meanwhile the experience most online shopping users have is more like a 80% success and a 20% failure rate at best, and often more like a 99% failure rate and a 1% success rate for regular shoppers. I have personally long since given up on every buying anything from EBay or Amazon because of the rampant fakery. Literally everything has a thousand AAAA++++++ reviews that all are obviously generated via a template.
See thats the thing, the only people who know this for certain is the ones who have access to Amazon data. You or I dont know the experience for "most online shoppers". If anything looking at data has repeatedly made me realize that we in tech have a very bad understanding of "generalized overall population" cohorts.
But this is a total strawman argument. Who cares if you have a few false positives? What is the downside of marking a review as fake when it's not? Next to nothing. You don't even have to tell the person who submitted the review it was treated as fake.
I think it would be very difficult to detect these:
a) with 100% accuracy, and
b) without involving humans at all
but I noticed the trend of fake Amazon reviews some time ago, and some patterns were very evident. I'd click on the profile of a five-star reviewer, more often than not every review they left was five stars. And every product I'd click through on also had a deluge of suspicious five star reviews. I even started scraping data into a graph database but then discovered that Amazon tries very hard to resist scraping attempts and gave up.
I think it would be very possible to set up some kind of basic pattern analysis along these lines. Clearly, merchants are buying positive reviews in bulk, so detecting sudden spikes in positive reviews and working out correlations with other product review dumps wouldn't be difficult. Once you've got a sense of it, you hand the data over to human investigators who take it the rest of the way. But despite its insane wealth, Amazon is clearly not interested in spending the money required to do that.
We will never have anything if we decide only perfect things are allowed to exist.
Its not the best example but even our country/state/local level justice system has cases where it fails. In addition to judges actually making incorrect verdicts, there are many ills that cause this - eg. Police not following procedures, poor people not having the money to right wrongs by appealing decisions, being bullied or scared into pleas etc.
That does not mean we should not have a justice system at all. Just like us, the systems we build are not perfect. That does not mean they should not exist. There is immense value in the valley between non existent and perfect.
I suspect anti-abuse at scale is impossible. The mistake (albeit a hugely profitable one) was to turn themselves into a marketplace and platform rather than a curated seller. The implication is clear: any marketplace is untrustworthy and caveat emptor.
"At scale" is a euphemism for an ideology whose goal is world domination, there's no reason for a desire for it to be protected. Anti-abuse systems being in place is more important than preventing constraints on growth.
I mean seriously, in the past several years we've learned that "at scale" is actually a public hazard.
I understand false positives are a problem - but why can't they simply only show / count reviews they are 99.9% certain AREN'T fake.
You might only show 80% of all user reviews - but if 10% of your reviews are fake - then you're only hiding 10% of real reviews.
Who cares? Amazon definitely doesn't have a problem of not having enough reviews. It does have a problem of having too many products with almost entirely fake reviews.
I agree - in a lot of these Amazon discussions on HN a lot of people seem to believe Amazon does nothing.
But if you look at e.g. seller forums at Amazon, there are a lot of large threads about the opposite problem - Amazon taking out lots of reviews the sellers consider legitimate.
So clearly they are doing something, and it might even be that only a small minority of fake reviews get through, but due to Amazon's scale there are still a lot of them.
Amazon: send requests to your customers at random to review products they purchased (perhaps reimburse them for their time with a gift card). Post only those reviews.
How will that help? The seller can still influence and finance the buyers. It may be more expensive but I'm certain the seller will do what is necessary for those high rankings. The stakes are that high.
If Amazon sells 10,000 Acme Widgets and sends an email to just 1% of those customers asking for a review. How is Acme going to game that? Pay all 10,000 purchasers of their product so that the 1% that get the Golden Ticket to write a review give them 5 stars?
The problem seems to me to be that the reviewer gets to decide to post a review. Turn it around and have Amazon select reviewers at random and you've made it much harder to game.
If it's an "expensive" product, you ask people to buy your product and have them send it back to you for a reward. If it's cheap, the reviewer can keep the item in exchange for a desirable review. The sample rate is irrelevant.. you just need enough reviews to out rank your competitors.
I'm not in the industry but I'm willing to bet there are flourishing communities that participate in this type of trade. And it's as sophisticated as it needs to be to get around Amazons countermeasures.
How else are we seeing so many corrupt reviews? Not like Amazon is just passively watching this happen. This is a huge issue for them since it's a bad user experience and Jeff Bezos is obsessed with satisfying customers.
Obviously sellers would initially only need to inform all 10k customers that if Amazon would ever request a review for them, there is a $20 reward waiting for writing a 5 star review. Obviously sellers would word things differently. This system would also be much cheaper to game because you only need to bribe 1% of all your customers.
In my opinion it is a huge net positive. Its not the popular opinion on HN but I do think it is.
Its personally been instrumental for me. Coming from a solidly lower middle class background in a developing country - Google was the only reason I could get better at my work and eventually do well financially. It gave me access to information that I could not afford otherwise, books were too expensive for my families income.
Even now I see how much services like search and Youtube help people learn. Youtube created a vibrant community of content in a local language that has helped many of my friends learn things. I recently spoke to a teenager who learnt to repair household electronics by watching tutorials on Youtube - in our local language.
Google has a lot of problems. I mean a lot. But its certainly a net positive in my opinion. While its fun to participate in (legitimate) first world discussions on web standards being killed by Google, it has undoubtedly improved the lives of millions of people from my home country.
Just like people, companies have the ability to do good and bad at the same time. For me the scale for Google is pretty heavily towards good.
Anyone who has worked on planet scale anti-abuse systems knows this is a very tough and never ending problem.
You think there are obvious signals for good or bad - the 0.01% cases where this does not hold up turns to 20 million daily mistakes if your product has 2B DAUs. You think you built something that works, spammers need to figure out just one loophole to game the system. Sometimes they don't even have to look at a technical loophole, if the economics works out you can even pay normal users small sums of money and the bad activity is now masked by troves of genuine user activity.
It is a little painful to see how HN loves to vilify people working on these issues, but its a little like saying we have police but the crime is not 0 yet. And if the argument is it happens too often that is actually most likely not true - a 99.999% accurate system will make a mistake 600k times every month for 2B DAU activity assuming one user interaction a day for the product. Which is often a big underestimation for many large products.
I do think like any other area there is a lot we can do better and that we are already working on it. But man it sucks to have spent 6 months working 12 hour days just to see someone making grandiose statements on how something is an easy issue to fix when you know its absolutely false.
I guess its fine though, FANG pays well and I enjoy my work and think its a net positive to the society.
/rant