Hacker Newsnew | past | comments | ask | show | jobs | submit | Quenhus's commentslogin

I'm sad to learn this projet is shutting down. The maintainer (xvello) contributed a lot to my uBlock dev filter [0]. We tried to reduce the time lost on deceptive and low-quality content for search engine users. Generative ML and aggressive SEO technics hit hard.

Bye letsblockit and I wish you well @xvello.

[0] https://github.com/quenhus/uBlock-Origin-dev-filter


I'm not OP, but here is my own uBlock filter with hundreds of GitHub/StackOverflow copycats: https://github.com/quenhus/uBlock-Origin-dev-filter


thank you!


thanks


For those on Google or DDG, here is my uBlock filter with hundreds of GitHub/StackOverflow copycats: https://github.com/quenhus/uBlock-Origin-dev-filter

It blocks copycats and hide them from multiple search engines. You may also use the list with uBlacklist.


These filters are light enough that they work well in Manifest V3 adblockers. The "Google - Global" filter only adds 1376 dynamic rules for example.


For those still on Google or DDG, here is my uBlock filter with hundreds of GitHub/StackOverflow copycats: https://github.com/quenhus/uBlock-Origin-dev-filter

It blocks copycats and hide them from multiple search engines. You may also use the list with uBlacklist.


Here is my uBlock filter with hundreds of GitHub/StackOverflow copycats: https://github.com/quenhus/uBlock-Origin-dev-filter

It blocks copycats and hide them from multiple search engines. You may also use the list with uBlacklist.


With these two pieces of data:

* the identical text copied from stack overflow should be easily identifiable

* volunteers put together a list of these sites themselves

it should be obvious to Google apoligists that Google is either negligent or intentionally allowing these sites in their search. I'm sick of hearing about how "the world is different" and it's an "arms race" between spam sites and google. Bullshit.


> the identical text copied from stack overflow should be easily identifiable

Google starts matching content from SO => Spammers start tweaking the text slightly => google implements some expensive similarity score to down rank copy cat sites => spammers use more complex scrambling=> ...

> volunteers put together a list of these sites themselves

These lists only work because they're used by a tiny minority of people. If Google were to do this the spammers would start switching domains more quickly (or find some other workaround).

I'm no Google apologist but I think you're underestimating how hard search ranking is when spammers are actively trying to game the system.


> tweaking the text slightly

That's what ML is perfect at detecting, which is Google's forte.

Some of these sites have been returned as top results for a while, so are you suggesting that Google just gave up because spammers would be able to evade them with an update?


Yes it is arms race, google has far more resources than spammers do so they should be ahead easily.

You underestimate the resources google has at its disposal.

They simply don’t care because there is no real competition to worry,even with this spam you are still likely to use google, so why would profit motivated company bother ?


SO seem to have Yahoo ads, so I guess it is a no brainer for Google to rank sites they profit from over the content the lusers want.


This is the real answer.


The problem with these theories is that they lack any sensible explanation of motive. Google intentionally degrading its search results because they "earn more if the user has to search again and again" just doesn't feel right: even if it were true in some short-term experiment, it would compromise the way people at Google think of themselves and their work to a degree that would be devastating to the company. There is no way they would throw away that sort of value without being under intense pressure, which they definitely are not.


Another comment stated that SO uses ads from someone else than Google, while the copy-paste sites use Google for ads. If true, that is clear monetary incentive to not go after this too hard.


They've also demonstrated that they can derank the Wikipedia clones. Funny how that ability is lost when the site in question makes money for a competitor.


These large tech companies have a long and varied history of stupid short-term decision making for profit and bad products due to local individual failures. Until there is a clear and detailed explanation of how the spam sites are avoiding google's wrath, the explanation of stupidity or short-term thinking on Google's part seems just as plausible.


Well come up with an explanation of how these entirely mechanically generated SO clone sites, with no obfuscation, are allowed to exist by Google, when identifying them and removing them should be fairly trivial?

At the very least they're being deliberately neglectful because they don't feel the bad experience harms their revenue because there's no other substantial competitor so they can abuse their monopoly status.

I guess they may just not care enough about software developers and figure we're mostly using ad blockers so its wasted effort and we'll develop blocklists ourselves. With no monetary value that they can assign to the ill will that it engenders they figure it must not matter so they don't bother. Pissing off a large chunk of the entire IT community via obvious neglect seems like a poor move to me, but then I've never felt that I'm cut out for management.


Maybe the problem is just genuinely hard and beyond their capabilities.


Detecting identical snippits of text is beyond virtually no one's abilities.


Yeah, I subbed to the blocklist that someone else published that they're maintaining manually. Google certainly has the resources to beat that bar.

It feels like economy-wide that decision makers in corporations and governments have just arrived at the conclusion that there's no money / no point in trying to stop scammers (and there might be an actual cost to revenue of doing so). It won't goose their quarterly numbers and might hurt them so its better to allow it.


This even works on Firefox Nightly on Android. Thanks a lot!


This is fantastic! This is exactly what I needed, thanks!


You rock. Thank you.


That's actually really cool, thanks! However, the link don't work from HN for me. For the link to work, users need to click from a trusted domain listed here https://github.com/gorhill/uBlock/blob/bba4732c6b47134c3f54e...


From 1.41.5b2 and above, it works on non-"trusted" sites with right-clicking the link, there will be an entry on the contextual menu to import the list.

For older versions of uBO, you can already use the old way:

    abp:subscribe?location=[...]


Thanks for your incredibly fast response and help. (Do you have a notification when your username is invoked in HN?) And of course, thanks for uBO!

In my case, the issue is that GitHub doesn't allow the apb|ubo protocol in links. However, no problem, I can use the method with "subscribe.adblockplus.org". Yet, subscribing from contextual menu is a great feature.


Hey, it’s the creator! Thanks for uBO.


As the author of the filter, I strongly agree with you. However, I believe it would be too tedious for most people to update the filter "by hand". I think I'm going to add this important security information in the README.


Here is my uBlock filter with hundreds of GitHub/StackOverflow copycats: https://github.com/quenhus/uBlock-Origin-dev-filter

It blocks copycats and hide them from multiple search engines. You may also use the list with uBlacklist.


If you can do this, so can Google. This just shows they refuse to.


> If you can do this, so can Google. This just shows they refuse to.

If they immediately blocked these sites then Google would get a lot of flack for censoring the web.

I don't like these sites as much as anyone. A while back I even tweeted about[0] having a dream where I wrote a browser extension to intercept and redirect these copycat sites to the real site.

In my mind this falls into the same category as phone spam. The phone networks could block these but how would you feel if you knew your phone company was auto-filtering incoming calls without you having any control over that? It's a very thin line.

Hopefully one days algorithms will be smart enough to auto de-rank copycat sites or blatant plagiarism so they don't show up on the first page.

[0]: https://twitter.com/nickjanetakis/status/1473671136928018434


They already de-rank plenty of sites for countless abuses, especially for gaming search. They have been doing this for a long time, and no one has ever called it censorship. This is the first time I've heard of anyone even suggesting this.

Also, their ranking algorithm is extremely complex. To suggest one complex algorithm is censorship and another is unbiased search results is to have a very naive understanding of how search works.


>algorithms will be smart enough to auto de-rank copycat sites or blatant plagiarism

So... if google creates an algorithm to detect copycatting/plagiarism it's okay for them to deploy it, but it's not okay if they do it by hand?


> So... if google creates an algorithm to detect copycatting/plagiarism it's okay for them to deploy it, but it's not okay if they do it by hand?

No, I thought more about my comment a day later. I don't know what a fair answer is. Being ranked on page 216 by an algorithm or de-listed manually is basically the same outcome.


For developers, you can remove some spam websites from Google and other search engines, with these uBlock filters: https://github.com/quenhus/uBlock-Origin-dev-filter


As others have said, it can also be done with a uBlock filter.

For example, here are my filters that block and remove terrible copycats/translations of StackOverflow https://gist.github.com/quenhus/6bd2c47e5780f726f0c96c0a2ee7... . I hate these copies that somehow manage to be better referenced than StackOverflow.


Would you submit this to an the awesome list I’ve made for uBlacklist?

I love uBlacklist, but I felt the community aspect of it was missing, and vital to its success. I felt bootstrapping lists and GitHub repos was the most instantaneous way to enable a community.

I comment each time it hits HN, but beyond that haven’t seen much uptake / contribution towards building blacklists.

https://github.com/rjaus/awesome-ublacklist


And awesome list is a great idea! I have a huge blocklist of those fake machine-translated e-commerce sites that just redirect to AliExpress. I'll try to remember to submit it later today


Thanks a lot. Pinterest and the Stack Overflow translations are a great start. I'm using that + *://*.quora.com/* now and feel like my web experience is already a lot better :D


Isn't this equally possible with a uBlock blocklist?

I already trust gorhill, and a bunch of Firefox maintainers, with my life and my passwords.


I'm guessing you haven't tried it. uBlacklist is purpose-built for search filtering, so the UX is far better - It pretty much feels like an official part of Google Search. Next to each result is a single-click block button and the results can be shown/hidden with a button in the top options bar. uBlock gives you all the building blocks, but you have to assemble and manage them yourself.

As for trust, that's a more general issue. Personally, I can't imagine someone would go through this much trouble just to ship a trojan and I find the probability of their GitHub and AMO account being compromised quite unlikely since there are far more lucrative targets for such an attacker - including uBlock Origin. I guess there's always the possibility of auditing the code and installing a self-signed version with no automatic updates.


> Personally, I can't imagine someone would go through this much trouble just to ship a trojan

I agree it’s not very risky, but I can absolutely imagine this.


This is amazing. I tried extending this to remove all pinterest results too (the real cancer of the internet) but the filter doesn't seem to be working.. Any tips?

    google.*##.g:has(a[href*="pinterest.*"])


And here’s the same thing for uBlacklist:

https://github.com/rjaus/ublacklist-pinterest/blob/main/ubla...


I am using these:

    google.*##.g:has(a[href*=".pinterest.*"])
    google.*##a[href*=".pinterest."]:upward(1)


Thanks this seems to be the closest. On the search results page the Pinterest results still appear, but without a link to the site, so you see floating paragraphs. On the image results page the pinterest results do seem to disappear.

I tried setting the upward() to 2, but that got rid of all the image results.


Thank-you so much! It's been a disease on my browser for too long


Excellent, thanks. I am so glad to have that pinterest shit out of my search results. Companies that try to take over your life, even in small ways, should all fall apart.


It doesn't work because = is looking strictly for a substring, it doesn't do globbing. Something like [href="pinterest."] probably gets closer to what you're expecting.


(use \* to escape the *)


Ok, genuine question - Why do people hate Pinterest so much? I have no feelings towards it either way (I don't have an account on it but I have viewed some things over time)


Imagine the following scenario: you need an image for an internal presentation, so you don't care about copyright. You find an image in the Google search results that looks promising. You open it, but end up on the dreaded Pinterest site.

You'll know for sure that there is no chance on above average quality or size. Hovering over the image will overlay a "join us" message. If you click to image you get a "join us" modal. When you right click, you get a custom menu, that does have a "Save image" option. But clicking that will just get you the same "join us" again. You'll need to inspector hack the image out, which can quickly become annoying if you need more than one.

Beneath it will be a whole bunch of also promising looking images, but scrolling down a bit will quickly get you, you guessed it, a "join us" modal. Clicking on any image there will get you to another page just like it, but this one often opening with the "join us" modal already open.

If you use Ctrl-click, to save your position in the overview that you inspector hacked the scroll block modal out of, tough luck: that behavior is modified as well. It will just open the new page like a regular click. Go back and you're at the top again.

The site feels like a collection of dark patterns hijacking your image search results.


The irony is that the Pinterest users who are putting this nonsense on their profiles are usually uploading copyrighted images to begin with.


Exactly this! If pinterest had a moral leg to stand on I might have some sympathy, but they have no implicit rights to the media they're spamming search results with.

Imagine if Instagram or Facebook tried the seo garbage pinterest is doing to completely take over image search results... they'd get shut down hard, and there would be screaming matches between c class movers and shakers.

I honestly think there's gotta be a kickback or individual level corruption involved, no other site would be allowed to break Google's image search functionality and reputation. It's not like they can't simply downrank and spread out the results. The pinterest situation is fishy af, and it's been years. Google image search used to be useful. Now it's annoying.


You're right, it's been a while since I went there. I forgot about all that shit.

I'm sold.

Death to Pinterest


My hatred is due to the fact that if you image search for something, e.g. a product, click on what you think is the product’s site, however you end up on someone’s Pinterest board. From there, there’s no way to get back to the original site. And the biggest annoyance is that the search results always seem to rank better than originals.


Nuking Pinterest off of the Internet is what immediately came to my mind as well!


Seriously, how does something so hated continues to exist?

Is some evil billionaire secretly bankrolling Pinterest as a cruel joke?


This is a classic example of the HN bubble. Nobody of my "tech" friends uses Pinterest, everyone who's not in that group uses it heavily for finding furniture, clothes or recipes. It's usually the app they use instead of googling for something.


My former manager worked there..


Anecdotally, when I've mentioned my hate for Pinterest appearing in search results, several of my coworkers have reacted with surprise. They use it regularly, and mentioned something about pinning interesting results. Our individual minds boggled at each other.


You know, maybe it's really good! I never considered this possibility because of its hostile UX.

Any service that pops up a login and won't let you access any content without logging in I just nope out of and have for many years. Especially user-hostile on mobile (Twitter and Reddit websites work really really hard to force you into using their apps and/or logging in on mobile, much more than on desktop). But maybe we're missing something and Pinterest is super awesome. Maybe I've been using this anti-user UX pattern as a signal for "crapware" but it's not accurate. Maybe fantastic services are hiding behind this pattern.

I'm not gonna sign up to find out but it's interesting to think about.

Or maybe I'll setup a VM for this and finally get FB/Insta/TikTok/Pinterest/Twitter, check em all out, and find out what the rest of humanity has been up to.


Google will have all those signals that people like using Pintrest and will keep it on the first page.


As much as I hate walled garden sites like Pinterest, Quora, Instagram it is huge for looking up clothes, recipes, jewelry.

So it serves a function for non-tech people. Its format works for them.


Using it daily for inspiration for radio controlled cars and trucks I scratch-build from styrene. I also use it for interior design ideas, fashion and if the odd pitcure of a VW T4 van build pops up I tend to save to a collection for when I start my own conversion in the spring.

I love it.


The only people who hate Pinterest are computer nerds, which are an extremely tiny minority of the population, and not Pinterest's target user anyways). Everyone else either likes it or doesn't have a strong opinion on it.

Pinterest is very popular in my friend group (which contains zero computer nerds outside of myself).


> Seriously, how does something so hated continues to exist?

Something hated by billions of nonusers (but not to point of outlawing it) and liked by 25 paying consumers can happily survive as a business.

And Pintereset is actually liked by many people who have user accounts there.


Pinterest is a hate/love relationship. Sometimes it's like a kind of archive of things which have disappeared on other sites (imagine certain clothes you cannot buy anymore). I actually like that they really make a copy of the content. But of course this is totally non tech related.


I wouldn't call it an archive, it's more of a fragment. There's usually no context or link back to the source so it merely exists as evidence that you're not insane.


haha, I like this reply "... evidence that you're not insane" :D


would also add IG and FB to the list.


Sooooo so many hobbies have moved from forums to first Facebook, and the last few years IG. One of my girlfriend's workflows is research on IG, buy on Etsy.


PLEASE MAKE THIS WORK.

caps intended.


An uBlock filter list is the correct solution to this problem. I want to keep browser extensions which have full access to all website contents at an absolute minimum. I only use extensions which are available in the official Arch repos (like firefox-ublock-origin).


I do that to remove Medium websites from my Google search. It's a disgrace to the free internet and the amount of garbage on it is staggering.


Sometimes there are good articles on it, scribe.rip is pretty handy for reading those few.


Thank you. Came here to say the same thing. It's a blight on the Web.


Is there a way to filter out sites that use their own domains but are medium websites ?


also 'towards data science';


Yes, that too.


I didn't even think to try and do such with uBlock, and this precompiled list gives me all the more reason to do so. Thank you!


Thanks! These garbage sites have been sneaking towards the top search results the last months.


That's wonderful, thanks for that list. I hope that someone else have figured one of github/gitlab issues clones


I just added a filters list for Github copycats in the Gist https://gist.github.com/quenhus/6bd2c47e5780f726f0c96c0a2ee7... . List of domains taken from https://github.com/arosh/ublacklist-github-translation


Created a repository for it which also includes the Pinterest list of uBlacklist (https://github.com/stroobants-dev/ublock-origin-shitty-copie...).

Easier to keep it up to date than a gist.


1. How do I subscribe to these lists in uBlock?

2. Often, the original website (Github, Stackoverflow) is not present in my search results; in such situations, I prefer seeing the copycats to seeing nothing at all. Wouldn't using these black lists be counterproductive for me then?


Thank you for this filter. Much appreciated!


That is specifically the only thing I've ever used this for. The worst offender hides the "solution" behind a paywall.


Thank you!


github-wiki-see.page is a new one I came across today.


I don't know what to do with it. You can't find Github Wiki results on search engine because of Github's robots.txt. The project is thus quite legit. Source: https://github-wiki-see.page/


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: