Hacker Newsnew | past | comments | ask | show | jobs | submit | peter_d_sherman's commentslogin

Very interesting!

Yes, in this day and age, I could definitely see web pages being harder to crawl by search engines (and SEO companies and other users of automated web crawling technologies (AI agents?)) than they were in the early days of the Internet due to many possible causes -- many of which you've excellently described!

In other words, there's more to be aware of for anyone writing a search engine (or search-engine-like piece of software -- SEO, AI Agent, etc., etc.) than there was in the early days of the Internet, where everything was straight unencrypted http and most URLs were easily accessible without having to jump through additional hoops...

Which leads me to wonder... on the one hand, a website owner may not want bots and other automated software agents spidering their site (we have ROBOTS.TXT for this), but on the flip side, most business owners DO want publicity and easy accessibility for sales and marketing purposes, thus, they'd never want to issue a 403 (or other error code) for any public-facing product webpage...

Thus there may be a market for testing public facing business/product websites against faulty "I can't give you that web page for whatever reason" error codes from a wide variety of clients, from a wide variety of locations around the world.

That market is related to the market for testing if a website is up and functioning properly (the "uptime market"), again, from a wide variety of locations around the world, using a wide variety of browsers...

So, a very interesting post!

Also (for future historians!) compare all of the restrictive factors which may prevent access to a public-facing web page today Vs. Tim Berners-Lee original vision for the web, which was basically to let scientists (and other academic types!) SHARE their data PUBLICLY with one another!

(Things have changed... a bit! :-) )

Anyway, a very interesting post, and a very interesting article -- for both present and future Search Engine programmers!


>"But an approach that’s immediately and broadly applicable today—and for which we’re releasing several new products—is based on what we call

computation-augmented generation, or CAG.

The key idea of CAG is to inject in real time capabilities from our foundation tool into the stream of content that LLMs generate. In traditional retrieval-augmented generation, or RAG, one is injecting content that has been retrieved from existing documents.

CAG is like an infinite extension of RAG

, in which an infinite amount of content can be generated on the fly—using computation—to feed to an LLM."

We welcome CAG -- to the list of LLM-related technologies!


My favorite quote from the article:

"Don't shoot the messenger. The person reporting the bug is not your enemy. The bug is."


This is a highly interesting comment from user "thoughtfullyd4c9a86b93" on the above site:

>"My two cents worth — Logic is fundamental. Most of mathematics does not treat infinities nor singularities as first class citizens. Yet, there are a lot of problem classes in which you can actually reason with a set that includes those limits. My preference is a strict axiomatic hierarchy where you can not blend “levels”. Each level is a gatekeeper for the next tier.

The idea that mathematics is a language of its own does not work until you completely disambiguate mathematics in your language of choice — and logic is a language that facilitates complete understanding.

* ⟨T⟩0: ZFC (The Material). The box that contains the idea of a box. If you aren’t starting here, you aren’t even playing the game.

* ⟨T⟩1: Topology (The Stage). This defines “nearness” before you own a ruler. It’s the rubber floor where a donut is a coffee cup—and where singularities (like the zero-point or the absolute) are perfectly admissible inhabitants.

* ⟨T⟩2: Geometry (The Ruler). Adding distance and angles. It’s just Topology after it’s been forced to commit to a specific measurement.

* ⟨T⟩3: Algebra (The Syntax). The ledger for people who trust the ruler more than the stage. It tracks the symmetries the geometry allows.

* ⟨T⟩4: Analysis (The Measure). Measuring the vibration of a string to prove the violin is real. Great for change, but “blind” to the structural admissibility of the stage itself."


Reminds me of this quote by Oscar Wilde;

"I am so clever that sometimes I don't understand a single word of what I am saying".


Here is what Grok says about the above (I asked it to explain it better):

Grok:

"⟨T⟩0: ZFC (The Material) — Zermelo–Fraenkel set theory with the axiom of choice (the standard foundation for most modern mathematics). Called "The Material" and metaphorically "the box that contains the idea of a box," highlighting how ZFC provides the basic "stuff" (sets) out of which everything else is built. Without this, "you aren’t even playing the game."

⟨T⟩1: Topology (The Stage) — Introduces the primitive notion of "nearness" or continuity without any rigid measurement (no distances or angles yet). Famously, topology is "rubber-sheet geometry," where continuous deformations are allowed, so a donut and a coffee mug are equivalent (both have one hole/handle). Singularities/infinities (e.g., zero-point in physics or the point at infinity in projective geometry) can exist naturally here without causing foundational issues.

⟨T⟩2: Geometry (The Ruler) — Builds on topology by adding concrete measurements (distances, angles, metrics). It's topology "forced to commit" to specifics.

⟨T⟩3: Algebra (The Syntax) — Focuses on symmetries and structures (groups, rings, fields, etc.) that geometry permits. It's more abstract and rule-based ("the ledger" tracking allowed operations).

⟨T⟩4: Analysis (The Measure) — Deals with limits, continuity, change, integration/differentiation, etc. ("measuring the vibration of a string"). It's powerful for dynamics but "blind" to deeper structural issues in the underlying topology or sets.

(Or, phrased another way, it's one set of possibilities for a "Math/Mathematics Stack" (AKA "Abstraction Hierarchy", "Math Abstraction Hierarchy") built level by level, on top of the foundation of Logic...)


This looks like a reasonably good page (there possibly are better ones) for general AI chatbots, rate limits and sign-in requirements:

https://fmhy.net/ai


(Comedy writing mode ON: )

"We need to flatten the curve..."

(Comedy writing mode OFF: )

You know, to re-quote the powers-that-be and the mainstream news media...

What, no takers?

You know, "flatten the curve... of population increase?" -- what, still not funny?

Hey, I'm just re-quoting what other people said... (a whole lot of people, incidentally!) but in the context of the article, above!

What, still no takers?

You people have no sense of (dark, very dark, let's be completely honest about that!) humor!

:-)


>"The abstraction tower

Here’s the part that makes me laugh, darkly.

I saw someone on LinkedIn recently — early twenties, a few years into their career — lamenting that with AI they “didn’t really know what was going on anymore.” And I thought: mate, you were already so far up the abstraction chain you didn’t even realise you were teetering on top of a wobbly Jenga tower.

They’re writing TypeScript that compiles to JavaScript that runs in a V8 engine written in C++ that’s making system calls to an OS kernel that’s scheduling threads across cores they’ve never thought about, hitting RAM through a memory controller with caching layers they couldn’t diagram, all while npm pulls in 400 packages they’ve never read a line of.

But sure. AI is the moment they lost track of what’s happening.

The abstraction ship sailed decades ago. We just didn’t notice because each layer arrived gradually enough that we could pretend we still understood the whole stack.

AI is just the layer that made the pretence impossible to maintain."

Absolutely brilliant writing!

Heck -- absolutely brilliant communicating! (Which is really what great writing is all about!)

You definitely get it!

Some other people here on HN do too, yours truly included in that bunch...

Anyway, stellar writing!

Related:

https://www.joelonsoftware.com/2002/11/11/the-law-of-leaky-a...

https://en.wikipedia.org/wiki/Tower_of_Babel

https://en.wikipedia.org/wiki/Abstraction_(computer_science)

https://en.wikipedia.org/wiki/Abstraction

https://ecommons.cornell.edu/entities/publication/3e2850f6-c...


>"But NixOS isn't the only declarative distro out there. In fact GNU forked Nix fairly early and made their own spin called Guix, whose big innovation is that, instead of using the unwieldy Nix-language, it uses Scheme. Specifically Guile Scheme..."

I'd be curious if a list exists of all declarative Linux distros out there, along with the configuration language (Nix, Scheme, etc.)

I'd also be curious as to how easy it would be to convert Scheme to the Nix language or vice-versa, in other words, it seems to me that there might be a "parent language" (for lack of a better term) out there for all lisplike and functional programming language (a subset of Haskell, F#, or some other functional programming language perhaps) that sort of might act as an intermediary conversion step (again, for lack of a better term!) between one functional or lisplike programming language and another...

Probably unrelated (but maybe somewhat related!) -- consider Pandoc... Pandoc is a Haskell program that basically uses a document tree structure to convert between one type of document format and another... maybe in terms of programming languages you'd call that an AST, an Abstract Syntax Tree... so maybe there's some kind of simplified AST (or something like that) out there that works as the base tree for all functional and lisp-like programming language (yes, lisp/lisplikes sort of preserve its/their own tree; their own AST -- via their intrinsic data structure, and that would seem to be true about functional programming languages too... so what is the base tree/AST of all of these, that all languages in this family can "map on to" (for lack of better terminology), that could be used (with AI / LLM's) as an "Intermediary Language" or "Intermediary Data Structure" (choose your terminology) to allow easily converting between one and the other?

Anyway, if we had that or something like that, then Nix configurations could (in theory) be easily converted to Guix, and vice-versa, automatically, as could any other Linux configured by a functional and/or lisplike language...

That, and I found the article very interesting!

I may have to try Guix in the future!


I was thinking the same thing. Since scheme is in the Lisp family, it should be straightforward to modernize it to something like Clojure, which is similar to Haskell as you mentioned. Being functional, but from a Java/Lisp ecosystem that might be more viable in the typical modern software environment.


Wouldn't that just make it harder to bootstrap an OS, needing to start with JVM and all...


Not necessarily harder, just add 'jdk25' to home packages. If you really don't want to use JVM you can use Babashka to start clojure and use it like you would bash.


Well, it makes it much harder to build the system from a simple assembler.

Guix is AFAIK the only distro with a well-paved bootstrap path from a simple assembler to a fully-working distro[0]. Adding the JVM or even GraalVM (which is what Babashka is based on) makes the bootstrapping that much harder.

[0]: https://guix.gnu.org/en/blog/2023/the-full-source-bootstrap-...


Is there a need to "modernise" it?


With Lisp you already have an AST.


>I’m six months behind on rent, just managed to buy time after my first eviction notice, and I’m trying to get back into building without constantly worrying about when the next one shows up.

Why not create a broker website between people who are getting eviction notices and Lawyers who specifically help people who are getting eviction notices?

That is, use what you are...

Or rather (phrased another way),

use the set of circumstances you are in, to turn around the set of circumstances that you're in.

It may sound meta, but if you individually are having a problem -- then so are a ton of other people as well!

If solving that problem has value to you, then such a solution is worth money.

If many people are also having that problem, then solutions to that problem are also worth money to them!

Sure, a legal solution, for example, finding an appropriate Lawyer to extend the point in time before eviction for say, 1-2 months (or whatever can be done) may be a suboptimal partial solution (in an ideal world, you'd like your rent to be free forever, as would everyone else), but the thing is,

even suboptimal partial solutions are worth money, if only a little bit of money...

Phrased another way,

there's money to be made by acting as a broker between people with eviction problems and the subset of Lawyers who specialize in that field, who could potentially ameliorate that problem if only a tiny amount, if only a little bit...

There's also money to be made in books and online reports... "What to do if you get an eviction notice".

Which such a book or report might not be worth the price of rent (obviously if someone had the rent money they'd pay it, problem solved), that information may be worth $19.95, or something in that ballpark...

Scaled across thousands of people with the same problem -- and we're looking at some decent money!

It probably won't make you rich or anything... but (and this is going to sound "evil" -- but it is not intentionally so!) it might make you enough money to pay your rent! :-)

use what you are

use the set of circumstances you are in, to turn around the set of circumstances that you're in.

Education, Knowledge, and Experience are everything...

Money, if it exists, if it exists at all, exists relative to, as an effect of these things...

Do you know everything possible about every single Landlord/Tenant Law and every single possible way to resolve an eviction notice?

If not... then I'd suggest that you are in one hell of an opportunity to learn everything you can about the matter!

Expertise (in any subject matter) translates to being sought, to being paid (sometimes very highly!) by others in return for advice, in return for knowledge, in return for communicated experience...

Money will naturally follow you with this learning once you have it, and once you monetize it -- although this will probably not happen today or tomorrow -- but it will in the future, if you can see the opportunity and capture it!

use what you are

use the set of circumstances you are in, to turn around the set of circumstances that you're in

Wishing you well in this... experience!

(You know, for lack of a better term! :-) )


>This started as a personal project because I wanted a clean, searchable dataset of startups across regions without jumping between multiple sources or dealing with noise I didn't want :).

I love this idea!

Something like that really needs to be done, and you've stepped up to the plate to begin that journey of putting all of that together!

A list of ALL startups, in one place, would really be great!

One question/caveat though -- how do you determine / how would you determine if a startup is no longer in startup mode?

That is, if the startup has become a big business, if the startup has been acquired, if the startup has failed, etc., etc.?

I guess (if the correct data wasn't present or unavailable or hard to parse, etc.) you could simply take startups off the last after a fixed time period, like maybe 12 months, 24 months, <?> months, ?.

Or, maybe add a retrieval date and source...

Two extra fields for your database... the date when it was spidered/sourced/parsed/found/uploaded/etc, and the source URL (or URL's...).

Then you could keep all of the data for all time... just let your users sort/filter on that retrieval date, for "freshness" of data, relative to their needs...

Anyway, looks great so far!

Great work!


Hey Peter, Thanks! I really appreciate the thoughtful feedback and your time.

> how do you determine / how would you determine if a startup is no longer in startup mode?

It is a challenge as startups transition all the time in different ways - funding rounds, IPOs or the dreaded deadpool - and I'm trying to figure out the best way to represent it. At this time, I'm using a combination of manually vetting, to soliciting public feedback through "edit this profile" button and showcasing the latest state. Rather than deleting entities that are no longer startups, I tag them with statuses like Public, Acquired, Shut Down, etc., and surface that on the profile page. Here is an example, https://startups.in/united-states/airbnb (you can find a badge under the logo and if you scroll down you can see a card that show the exit details).

As can be seen, Airbnb is marked as a "Public Company" with IPO metadata (ticker, exit date, exit value), and still remains in the database as part of the ecosystem rather than disappearing. The current idea is to treat this more like a longitudinal startup graph.

Long-term, I'd like this to behave more like a "historical record" of startups over time (dare say wikpedia for startups but presented differently?), not just a snapshot of "current startups". That way acquisitions, failures, and IPOs become first-class signals instead of reasons to delete data. Thanks again.


>Hey Peter, Thanks! I really appreciate the thoughtful feedback and your time.

I stand by what I said -- it is a really good idea!

>It is a challenge as startups transition all the time in different ways - funding rounds, IPOs or the dreaded deadpool - and I'm trying to figure out the best way to represent it. At this time, I'm using a combination of manually vetting,

I think that it's noble that you'd take it upon yourself to do this task manually, but this task may turn out to be too time-consuming and unsustainable into the future -- you might wish to consider outsourcing it and/or automating it with AI and/or automating it via trusted users who are entrusted to perform those updates... or of course, you could just continue to do it yourself...

>to soliciting public feedback through "edit this profile" button and showcasing the latest state. Rather than deleting entities that are no longer startups, I tag them with statuses like Public, Acquired, Shut Down, etc., and surface that on the profile page.

I think that's a good idea! More information, more transparency, more historical auditability, more information in general!

>Long-term, I'd like this to behave more like a "historical record" of startups over time (dare say wikpedia for startups but presented differently?), not just a snapshot of "current startups". That way acquisitions, failures, and IPOs become first-class signals instead of reasons to delete data.

It sounds like you are well on your way! I fully support that! "Wikipedia for Startups" sounds great, and if I needed to give a VC an elevator pitch of what you do in 60 seconds, or heck, 10 seconds or less, I'd phrase it exactly that way... "Wikipedia for Startups" (sounds great and communicates quickly!) or "Wikipedia for Startups but presented differently" (as you said!) or maybe "Wikipedia for Startups but presented with our own custom enhancements!" (sounds even better and would make the party on the other end ever more curious about it!).

But yes, looks great in general, I saw you're taking job listings (great idea, will help you monetize for the long haul!), and I think you're on a great track! (and of course, the world does very much need a "Wikipedia for Startups", however it is presented! :-) )

(Also, don't forget that Joel Spolsky made million$ when one of his interns added job search to his blog and Stack Overflow -- so the job search is a great way of monetizing and sustaining your vision, long term!)

So, wishing you a lot of luck!

It's brilliant, brilliant, brilliant, I say!


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: