More

Quenhus · on March 31, 2024

I'm sad to learn this projet is shutting down. The maintainer (xvello) contributed a lot to my uBlock dev filter [0]. We tried to reduce the time lost on deceptive and low-quality content for search engine users. Generative ML and aggressive SEO technics hit hard.

Bye letsblockit and I wish you well @xvello.

[0] https://github.com/quenhus/uBlock-Origin-dev-filter

Quenhus · on July 3, 2023

I'm not OP, but here is my own uBlock filter with hundreds of GitHub/StackOverflow copycats: https://github.com/quenhus/uBlock-Origin-dev-filter

redsaber · on July 3, 2023

thank you!

redsaber · on July 3, 2023

thanks

Quenhus · on July 3, 2023

For those on Google or DDG, here is my uBlock filter with hundreds of GitHub/StackOverflow copycats: https://github.com/quenhus/uBlock-Origin-dev-filter

It blocks copycats and hide them from multiple search engines. You may also use the list with uBlacklist.

chronogram · on July 3, 2023

These filters are light enough that they work well in Manifest V3 adblockers. The "Google - Global" filter only adds 1376 dynamic rules for example.

Quenhus · on Sept 1, 2022

For those still on Google or DDG, here is my uBlock filter with hundreds of GitHub/StackOverflow copycats: https://github.com/quenhus/uBlock-Origin-dev-filter

It blocks copycats and hide them from multiple search engines. You may also use the list with uBlacklist.

Quenhus · on July 23, 2022

Here is my uBlock filter with hundreds of GitHub/StackOverflow copycats: https://github.com/quenhus/uBlock-Origin-dev-filter

It blocks copycats and hide them from multiple search engines. You may also use the list with uBlacklist.

colordrops · on July 23, 2022

With these two pieces of data:

* the identical text copied from stack overflow should be easily identifiable

* volunteers put together a list of these sites themselves

it should be obvious to Google apoligists that Google is either negligent or intentionally allowing these sites in their search. I'm sick of hearing about how "the world is different" and it's an "arms race" between spam sites and google. Bullshit.

remus · on July 23, 2022

> the identical text copied from stack overflow should be easily identifiable

Google starts matching content from SO => Spammers start tweaking the text slightly => google implements some expensive similarity score to down rank copy cat sites => spammers use more complex scrambling=> ...

> volunteers put together a list of these sites themselves

These lists only work because they're used by a tiny minority of people. If Google were to do this the spammers would start switching domains more quickly (or find some other workaround).

I'm no Google apologist but I think you're underestimating how hard search ranking is when spammers are actively trying to game the system.

colordrops · on July 23, 2022

> tweaking the text slightly

That's what ML is perfect at detecting, which is Google's forte.

Some of these sites have been returned as top results for a while, so are you suggesting that Google just gave up because spammers would be able to evade them with an update?

manquer · on July 23, 2022

Yes it is arms race, google has far more resources than spammers do so they should be ahead easily.

You underestimate the resources google has at its disposal.

They simply don’t care because there is no real competition to worry,even with this spam you are still likely to use google, so why would profit motivated company bother ?

rightbyte · on July 23, 2022

SO seem to have Yahoo ads, so I guess it is a no brainer for Google to rank sites they profit from over the content the lusers want.

jiggawatts · on July 23, 2022

This is the real answer.

IfOnlyYouKnew · on July 23, 2022

The problem with these theories is that they lack any sensible explanation of motive. Google intentionally degrading its search results because they "earn more if the user has to search again and again" just doesn't feel right: even if it were true in some short-term experiment, it would compromise the way people at Google think of themselves and their work to a degree that would be devastating to the company. There is no way they would throw away that sort of value without being under intense pressure, which they definitely are not.

Beldin · on July 23, 2022

Another comment stated that SO uses ads from someone else than Google, while the copy-paste sites use Google for ads. If true, that is clear monetary incentive to not go after this too hard.

kevin_thibedeau · on July 23, 2022

They've also demonstrated that they can derank the Wikipedia clones. Funny how that ability is lost when the site in question makes money for a competitor.

colordrops · on July 23, 2022

These large tech companies have a long and varied history of stupid short-term decision making for profit and bad products due to local individual failures. Until there is a clear and detailed explanation of how the spam sites are avoiding google's wrath, the explanation of stupidity or short-term thinking on Google's part seems just as plausible.

lamontcg · on July 23, 2022

Well come up with an explanation of how these entirely mechanically generated SO clone sites, with no obfuscation, are allowed to exist by Google, when identifying them and removing them should be fairly trivial?

At the very least they're being deliberately neglectful because they don't feel the bad experience harms their revenue because there's no other substantial competitor so they can abuse their monopoly status.

I guess they may just not care enough about software developers and figure we're mostly using ad blockers so its wasted effort and we'll develop blocklists ourselves. With no monetary value that they can assign to the ill will that it engenders they figure it must not matter so they don't bother. Pissing off a large chunk of the entire IT community via obvious neglect seems like a poor move to me, but then I've never felt that I'm cut out for management.

burnished · on July 23, 2022

Maybe the problem is just genuinely hard and beyond their capabilities.

colordrops · on July 23, 2022

Detecting identical snippits of text is beyond virtually no one's abilities.

lamontcg · on July 23, 2022

Yeah, I subbed to the blocklist that someone else published that they're maintaining manually. Google certainly has the resources to beat that bar.

It feels like economy-wide that decision makers in corporations and governments have just arrived at the conclusion that there's no money / no point in trying to stop scammers (and there might be an actual cost to revenue of doing so). It won't goose their quarterly numbers and might hurt them so its better to allow it.

Phlogi · on July 23, 2022

This even works on Firefox Nightly on Android. Thanks a lot!

thejosh · on July 23, 2022

This is fantastic! This is exactly what I needed, thanks!

SmellTheGlove · on July 23, 2022

You rock. Thank you.

Quenhus · on Feb 17, 2022

That's actually really cool, thanks! However, the link don't work from HN for me. For the link to work, users need to click from a trusted domain listed here https://github.com/gorhill/uBlock/blob/bba4732c6b47134c3f54e...

gorhill · on Feb 17, 2022

From 1.41.5b2 and above, it works on non-"trusted" sites with right-clicking the link, there will be an entry on the contextual menu to import the list.

For older versions of uBO, you can already use the old way:

    abp:subscribe?location=[...]

Quenhus · on Feb 18, 2022

Thanks for your incredibly fast response and help. (Do you have a notification when your username is invoked in HN?) And of course, thanks for uBO!

In my case, the issue is that GitHub doesn't allow the apb|ubo protocol in links. However, no problem, I can use the method with "subscribe.adblockplus.org". Yet, subscribing from contextual menu is a great feature.

aspenmayer · on Feb 18, 2022

Hey, it’s the creator! Thanks for uBO.

Quenhus · on Feb 17, 2022

As the author of the filter, I strongly agree with you. However, I believe it would be too tedious for most people to update the filter "by hand". I think I'm going to add this important security information in the README.

Quenhus · on Feb 15, 2022

Here is my uBlock filter with hundreds of GitHub/StackOverflow copycats: https://github.com/quenhus/uBlock-Origin-dev-filter

It blocks copycats and hide them from multiple search engines. You may also use the list with uBlacklist.

colordrops · on Feb 15, 2022

If you can do this, so can Google. This just shows they refuse to.

nickjj · on Feb 16, 2022

> If you can do this, so can Google. This just shows they refuse to.

If they immediately blocked these sites then Google would get a lot of flack for censoring the web.

I don't like these sites as much as anyone. A while back I even tweeted about[0] having a dream where I wrote a browser extension to intercept and redirect these copycat sites to the real site.

In my mind this falls into the same category as phone spam. The phone networks could block these but how would you feel if you knew your phone company was auto-filtering incoming calls without you having any control over that? It's a very thin line.

Hopefully one days algorithms will be smart enough to auto de-rank copycat sites or blatant plagiarism so they don't show up on the first page.

[0]: https://twitter.com/nickjanetakis/status/1473671136928018434

colordrops · on Feb 16, 2022

They already de-rank plenty of sites for countless abuses, especially for gaming search. They have been doing this for a long time, and no one has ever called it censorship. This is the first time I've heard of anyone even suggesting this.

Also, their ranking algorithm is extremely complex. To suggest one complex algorithm is censorship and another is unbiased search results is to have a very naive understanding of how search works.

Lascaille · on Feb 16, 2022

>algorithms will be smart enough to auto de-rank copycat sites or blatant plagiarism

So... if google creates an algorithm to detect copycatting/plagiarism it's okay for them to deploy it, but it's not okay if they do it by hand?

nickjj · on Feb 16, 2022

> So... if google creates an algorithm to detect copycatting/plagiarism it's okay for them to deploy it, but it's not okay if they do it by hand?

No, I thought more about my comment a day later. I don't know what a fair answer is. Being ranked on page 216 by an algorithm or de-listed manually is basically the same outcome.

Quenhus · on Jan 3, 2022

For developers, you can remove some spam websites from Google and other search engines, with these uBlock filters: https://github.com/quenhus/uBlock-Origin-dev-filter

Quenhus · on Dec 14, 2021

As others have said, it can also be done with a uBlock filter.

For example, here are my filters that block and remove terrible copycats/translations of StackOverflow https://gist.github.com/quenhus/6bd2c47e5780f726f0c96c0a2ee7... . I hate these copies that somehow manage to be better referenced than StackOverflow.

RileyJames · on Dec 14, 2021

Would you submit this to an the awesome list I’ve made for uBlacklist?

I love uBlacklist, but I felt the community aspect of it was missing, and vital to its success. I felt bootstrapping lists and GitHub repos was the most instantaneous way to enable a community.

I comment each time it hits HN, but beyond that haven’t seen much uptake / contribution towards building blacklists.

https://github.com/rjaus/awesome-ublacklist

franga2000 · on Dec 14, 2021

And awesome list is a great idea! I have a huge blocklist of those fake machine-translated e-commerce sites that just redirect to AliExpress. I'll try to remember to submit it later today

kriro · on Dec 14, 2021

Thanks a lot. Pinterest and the Stack Overflow translations are a great start. I'm using that + *://*.quora.com/* now and feel like my web experience is already a lot better :D

xorcist · on Dec 14, 2021

Isn't this equally possible with a uBlock blocklist?

I already trust gorhill, and a bunch of Firefox maintainers, with my life and my passwords.

franga2000 · on Dec 14, 2021

I'm guessing you haven't tried it. uBlacklist is purpose-built for search filtering, so the UX is far better - It pretty much feels like an official part of Google Search. Next to each result is a single-click block button and the results can be shown/hidden with a button in the top options bar. uBlock gives you all the building blocks, but you have to assemble and manage them yourself.

As for trust, that's a more general issue. Personally, I can't imagine someone would go through this much trouble just to ship a trojan and I find the probability of their GitHub and AMO account being compromised quite unlikely since there are far more lucrative targets for such an attacker - including uBlock Origin. I guess there's always the possibility of auditing the code and installing a self-signed version with no automatic updates.

pbronez · on Dec 14, 2021

> Personally, I can't imagine someone would go through this much trouble just to ship a trojan

I agree it’s not very risky, but I can absolutely imagine this.

ghoomketu · on Dec 14, 2021

This is amazing. I tried extending this to remove all pinterest results too (the real cancer of the internet) but the filter doesn't seem to be working.. Any tips?

    google.*##.g:has(a[href*="pinterest.*"])

RileyJames · on Dec 14, 2021

And here’s the same thing for uBlacklist:

https://github.com/rjaus/ublacklist-pinterest/blob/main/ubla...

alfu · on Dec 14, 2021

I am using these:

    google.*##.g:has(a[href*=".pinterest.*"])
    google.*##a[href*=".pinterest."]:upward(1)

politelemon · on Dec 14, 2021

Thanks this seems to be the closest. On the search results page the Pinterest results still appear, but without a link to the site, so you see floating paragraphs. On the image results page the pinterest results do seem to disappear.

I tried setting the upward() to 2, but that got rid of all the image results.

ashleysmithgpu · on Dec 14, 2021

Thank-you so much! It's been a disease on my browser for too long

jb1991 · on Dec 14, 2021

Excellent, thanks. I am so glad to have that pinterest shit out of my search results. Companies that try to take over your life, even in small ways, should all fall apart.

naniwaduni · on Dec 14, 2021

It doesn't work because = is looking strictly for a substring, it doesn't do globbing. Something like [href="pinterest."] probably gets closer to what you're expecting.

detaro · on Dec 14, 2021

(use \* to escape the *)

ItsBob · on Dec 14, 2021

Ok, genuine question - Why do people hate Pinterest so much? I have no feelings towards it either way (I don't have an account on it but I have viewed some things over time)

DrSiemer · on Dec 14, 2021

Imagine the following scenario: you need an image for an internal presentation, so you don't care about copyright. You find an image in the Google search results that looks promising. You open it, but end up on the dreaded Pinterest site.

You'll know for sure that there is no chance on above average quality or size. Hovering over the image will overlay a "join us" message. If you click to image you get a "join us" modal. When you right click, you get a custom menu, that does have a "Save image" option. But clicking that will just get you the same "join us" again. You'll need to inspector hack the image out, which can quickly become annoying if you need more than one.

Beneath it will be a whole bunch of also promising looking images, but scrolling down a bit will quickly get you, you guessed it, a "join us" modal. Clicking on any image there will get you to another page just like it, but this one often opening with the "join us" modal already open.

If you use Ctrl-click, to save your position in the overview that you inspector hacked the scroll block modal out of, tough luck: that behavior is modified as well. It will just open the new page like a regular click. Go back and you're at the top again.

The site feels like a collection of dark patterns hijacking your image search results.

jb1991 · on Dec 14, 2021

The irony is that the Pinterest users who are putting this nonsense on their profiles are usually uploading copyrighted images to begin with.

robbedpeter · on Dec 14, 2021

Exactly this! If pinterest had a moral leg to stand on I might have some sympathy, but they have no implicit rights to the media they're spamming search results with.

Imagine if Instagram or Facebook tried the seo garbage pinterest is doing to completely take over image search results... they'd get shut down hard, and there would be screaming matches between c class movers and shakers.

I honestly think there's gotta be a kickback or individual level corruption involved, no other site would be allowed to break Google's image search functionality and reputation. It's not like they can't simply downrank and spread out the results. The pinterest situation is fishy af, and it's been years. Google image search used to be useful. Now it's annoying.

ItsBob · on Dec 14, 2021

You're right, it's been a while since I went there. I forgot about all that shit.

I'm sold.

Death to Pinterest

iamphilrae · on Dec 14, 2021

My hatred is due to the fact that if you image search for something, e.g. a product, click on what you think is the product’s site, however you end up on someone’s Pinterest board. From there, there’s no way to get back to the original site. And the biggest annoyance is that the search results always seem to rank better than originals.

post_from_work · on Dec 14, 2021

Nuking Pinterest off of the Internet is what immediately came to my mind as well!

zibzab · on Dec 14, 2021

Seriously, how does something so hated continues to exist?

Is some evil billionaire secretly bankrolling Pinterest as a cruel joke?

dewey · on Dec 14, 2021

This is a classic example of the HN bubble. Nobody of my "tech" friends uses Pinterest, everyone who's not in that group uses it heavily for finding furniture, clothes or recipes. It's usually the app they use instead of googling for something.

hdjrudni · on Dec 14, 2021

My former manager worked there..

politelemon · on Dec 14, 2021

Anecdotally, when I've mentioned my hate for Pinterest appearing in search results, several of my coworkers have reacted with surprise. They use it regularly, and mentioned something about pinning interesting results. Our individual minds boggled at each other.

4ec0755f5522 · on Dec 14, 2021

You know, maybe it's really good! I never considered this possibility because of its hostile UX.

Any service that pops up a login and won't let you access any content without logging in I just nope out of and have for many years. Especially user-hostile on mobile (Twitter and Reddit websites work really really hard to force you into using their apps and/or logging in on mobile, much more than on desktop). But maybe we're missing something and Pinterest is super awesome. Maybe I've been using this anti-user UX pattern as a signal for "crapware" but it's not accurate. Maybe fantastic services are hiding behind this pattern.

I'm not gonna sign up to find out but it's interesting to think about.

Or maybe I'll setup a VM for this and finally get FB/Insta/TikTok/Pinterest/Twitter, check em all out, and find out what the rest of humanity has been up to.

aembleton · on Dec 14, 2021

Google will have all those signals that people like using Pintrest and will keep it on the first page.

aerique · on Dec 14, 2021

As much as I hate walled garden sites like Pinterest, Quora, Instagram it is huge for looking up clothes, recipes, jewelry.

So it serves a function for non-tech people. Its format works for them.

FourthProtocol · on Dec 14, 2021

Using it daily for inspiration for radio controlled cars and trucks I scratch-build from styrene. I also use it for interior design ideas, fashion and if the odd pitcure of a VW T4 van build pops up I tend to save to a collection for when I start my own conversion in the spring.

I love it.

astura · on Dec 14, 2021

The only people who hate Pinterest are computer nerds, which are an extremely tiny minority of the population, and not Pinterest's target user anyways). Everyone else either likes it or doesn't have a strong opinion on it.

Pinterest is very popular in my friend group (which contains zero computer nerds outside of myself).

matkoniecz · on Dec 18, 2021

> Seriously, how does something so hated continues to exist?

Something hated by billions of nonusers (but not to point of outlawing it) and liked by 25 paying consumers can happily survive as a business.

And Pintereset is actually liked by many people who have user accounts there.

therealmarv · on Dec 14, 2021

Pinterest is a hate/love relationship. Sometimes it's like a kind of archive of things which have disappeared on other sites (imagine certain clothes you cannot buy anymore). I actually like that they really make a copy of the content. But of course this is totally non tech related.

cptskippy · on Dec 14, 2021

I wouldn't call it an archive, it's more of a fragment. There's usually no context or link back to the source so it merely exists as evidence that you're not insane.

therealmarv · on Dec 20, 2021

haha, I like this reply "... evidence that you're not insane" :D

deepstack · on Dec 14, 2021

would also add IG and FB to the list.

FourthProtocol · on Dec 14, 2021

Sooooo so many hobbies have moved from forums to first Facebook, and the last few years IG. One of my girlfriend's workflows is research on IG, buy on Etsy.

bozhark · on Dec 14, 2021

PLEASE MAKE THIS WORK.

caps intended.

zaik · on Dec 14, 2021

An uBlock filter list is the correct solution to this problem. I want to keep browser extensions which have full access to all website contents at an absolute minimum. I only use extensions which are available in the official Arch repos (like firefox-ublock-origin).

behnamoh · on Dec 14, 2021

I do that to remove Medium websites from my Google search. It's a disgrace to the free internet and the amount of garbage on it is staggering.

allisfalafel · on Dec 14, 2021

Sometimes there are good articles on it, scribe.rip is pretty handy for reading those few.

bencollier49 · on Dec 14, 2021

Thank you. Came here to say the same thing. It's a blight on the Web.

bubblethink · on Dec 15, 2021

Is there a way to filter out sites that use their own domains but are medium websites ?

mellavora · on Dec 14, 2021

also 'towards data science';

behnamoh · on Dec 14, 2021

Yes, that too.

bmlzootown · on Dec 14, 2021

I didn't even think to try and do such with uBlock, and this precompiled list gives me all the more reason to do so. Thank you!

dtech · on Dec 14, 2021

Thanks! These garbage sites have been sneaking towards the top search results the last months.

bluish29 · on Dec 14, 2021

That's wonderful, thanks for that list. I hope that someone else have figured one of github/gitlab issues clones

Quenhus · on Dec 14, 2021

I just added a filters list for Github copycats in the Gist https://gist.github.com/quenhus/6bd2c47e5780f726f0c96c0a2ee7... . List of domains taken from https://github.com/arosh/ublacklist-github-translation

878654Tom · on Dec 14, 2021

Created a repository for it which also includes the Pinterest list of uBlacklist (https://github.com/stroobants-dev/ublock-origin-shitty-copie...).

Easier to keep it up to date than a gist.

Siira · on Dec 14, 2021

1. How do I subscribe to these lists in uBlock?

2. Often, the original website (Github, Stackoverflow) is not present in my search results; in such situations, I prefer seeing the copycats to seeing nothing at all. Wouldn't using these black lists be counterproductive for me then?

dt3ft · on Dec 14, 2021

Thank you for this filter. Much appreciated!

zelon88 · on Dec 14, 2021

That is specifically the only thing I've ever used this for. The worst offender hides the "solution" behind a paywall.

nxpnsv · on Dec 14, 2021

Thank you!

sixothree · on Dec 14, 2021

github-wiki-see.page is a new one I came across today.

Quenhus · on Dec 14, 2021

I don't know what to do with it. You can't find Github Wiki results on search engine because of Github's robots.txt. The project is thus quite legit. Source: https://github-wiki-see.page/