You hate that I couldn't remember whether nix has array delimiters on the spot?
Jeez, tough crowd.
Though it also kinda represents this pride over triviality that some people latch on to in the AI world which is odd since it's also only a mistake a human would make. Had I run my hand-written comment through a proofreading clanker, I would have spared you the negative emotional reaction.
In the pre-AI world, you would have probably not been so harsh since it's understandable that someone make that mistake with such an idiosyncratic language like nix that almost nobody writes full-time. Yet in the AI world, you demand more from humans.
> [..] only a mistake a human would make. Had I run my hand-written comment through a proofreading clanker [..]
That's true, and I should've been less rude in my previous comment. Sorry.
(Though note `nix eval` would've sufficed, no need for the probabilistic kind of clanker)
> Yet in the AI world, you demand more from humans.
But this isn't true. In my opinion `learning >> depending on an LLM` and my gripe is that it seems like the former is being displaced by the latter. In the pre-AI world I would've known that the person making the mistake wasn't making it because they outsourced their skill.
So I'm not, in fact, demanding anything more than in the pre-AI world.
The other poster is being a jerk, but your point doesn’t really refute theirs: if you can’t even be bothered to check for an array delimiter, instead passing that on to an AI, how will you ever learn?
People are being more demanding of humans because humans are taking knowledge and learning for granted. All this abstraction has a cost, and the cost is you.
Things are falling out of your memory all the time as a function of how often or seldom you do something, how trivial or superficial the information is, and how minor the difference between other tools you use regularly.
Despite many years of experience, I still sometimes get this wrong in Python or its equivalent in the other languages I regularly switch between:
from math import sqrt as s
import sqrt as s from math
import sqrt from sqrt as s
The difference here is that, because I admitted to using AI, I don't get the grace of making the most trivial mistake in a forum comment.
And suddenly we pretend that had I just written it by hand a few more times, I would never err again in forum comments, despite making similar errors this week in languages I've used to write millions of lines of code across decades.
You found it! Yeah the 4th version of the browser is in Haskell and is only a couple hours in, so it’s nowhere near done. The Go version achieved Acid3 compliance in 7 hours, but I expect this one to take a lot longer since Haskell is a bit more difficult to work with and there’s probably less Haskell in the training dataset.
I’d been archiving/scrubbing each one so that the next assistant wouldn’t be able to use the previous branch as a guide, but since you asked, I pushed the archive of the Go one, feel free to rip it apart: https://github.com/chrisuehlinger/viberowser-go
So I called out Acid3 in the original comment (and mentioned why it’s not the holy grail) so people wouldn’t get the idea that I was building full-on modern browsers. I’m not sure what I need to say to make y’all happy. I’m just excited that these tools are capable of doing non-trivial work and I’m having fun throwing tasks at it to see what comes out. I’m not going around telling people to download or use these things.
Your browser does not have the concept of breaking a line once it gets too long[1].
Your browser does not even shape text during layout and it renders text using a DrawString[2] function from a library that only applies kerning. No complex shaping to be seen in a light-year radius.
There is no trace of bidi-reordering either. I can't link to anything here since there's nothing to link to.
I will leave this[3] here too but I'm not going to draw conclusions without a deeper understanding of wtf the agent did here and how Acid3 works.
From now on if you still don't understand how this does not deserve the title of a browser I will assume you are trolling.
> I’m not going around telling people to download or use these things.
My problem is that you're telling people you built a browser. Some people have standards for what can be considered even a "toy" browser (this is not it).
It seemingly did but after I saw it define a VerticalAlign twice in different files[1][2][3] I concluded that it's probably not coherent enough to waste time on checking the correctness.
Would be interesting if someone who has managed to run it tries it on some actually complicated text layout edge cases (like RTL breaking that splits a ligature necessitating re-shaping, also add some right-padding in there to spice things up).
I took a 5-minute look at the layout crate here and... it doesn't look great:
1. Line height calculation is suspicious, the structure of the implementation also suggests inline spans aren't handled remotely correctly
2. Uhm... where is the bidi? Directionality has far reaching implications on an inline layout engine's design. This is not it.
3. It doesn't even consider itself a real engine:
// Estimate text width (rough approximation: 0.6 * font_size * char_count)
// In a real implementation, this would use font metrics
let char_count = text.chars().count() as f32;
let avg_char_width = font_size * 0.5; // Approximate average character width
let text_width = char_count * avg_char_width;
I won't even begin talking about how this particular aspect that it "approximates" also has far reaching implications on your design...
I could probably go on in perpetuity about the things wrong with this, even test it myself or something. But that's a waste of time I'm not undertaking.
Making a "browser" that renders a few particular web pages "correctly" is an order of magnitude easier than a browser that also actually cares about standards.
If this is how "A Browser for the modern age." looks then I want a time machine.
I saw a "web browser" that was AI generated in maybe 2k lines of python based on tkinter that tried to support CSS and probably was able to render some test cases but didn't at all have the shape of a real web browser.
It reminds of having AI write me an MUI component the other day that implemented the "sx" prop [1] with some code that handles all the individual properties that were used by the component in that particular application, it might have been correct, the component at all was successful and well coded... but MUI provides a styled() function and a <Box> component, either one of which could have been used to make this component handle all the properties that "sx" is supposed to handle with as little as one line of code. I asked the agent "how would I do this using the tools that MUI provides to support sx" and had a great conversation and got a complete and clear understanding about the right way to do it but on the first try it wrote something crazy overcomplicated to handle the specific case as opposed to a general-purpose solution that was radically simple. That "web browser" was all like that.
[1] you can write something like sx={width: 4} and MUI multiplies 4 by the application scale and puts on, say, a width: 20px style
Thank you for the detailed feedback, though we would prefer for you to comment on the announcement threads where you see it. We really appreciate the feedback.
You're referring to State of Utopia's[1] web browser, currently available here:
That livestream demonstration is side-by-side with Chrome, rendering very simple pages.
It compiles, renders simple web pages and is able to post.
The differences between cursor's browser and our browser:
- Cursor's long-running autonomously coded browser: over a million lines of code and a trillion tokens, which is computationally intensive and has a high cost.
- State of Utopia's browser: under 3000 lines of code.
- Cursor's browser: does not compile at present. There's no way to use it.
- State of Utopia's browser: compiles in every version. You can use it right away, and it includes a fun easter-egg game.
- Cursor's browser: can't make form submissions
- State of Utopia's browser: can make form submissions.
I'm submitting this using that browser. (I don't know if it will really post or not.)
We are taking feature requests!! Submit your requested feature here:
> Hundreds of other widely-used open source libraries don't.
Correct me if I'm wrong but I don't think versioned symbols are a thing on Windows (i.e. they are non-portable). This is not a problem for glibc but it is very much a problem for a lot of open source libraries (which instead tend to just provide a stable C ABI if they care).
There’re quite a few mechanics they use for that. The oldest one, call a special API function on startup like InitCommonControlsEx, and another API functions will DLL resolve differently or behave differently. A similar tactic, require an SDK defined magic number as a parameter to some initialization functions, different magic numbers switching symbols from the same library; examples are WSAStartup and MFStartup.
Around Win2k they did side by side assemblies or WinSxS. Include a special XML manifest into embedded resource of your EXE, and you can request specific version of a dependent API DLL. The OS now keeps multiple versions internally.
Then there’re compatibility mechanics, both OS builtin and user controllable (right click on EXE or LNK, compatibility tab). The compatibility mode is yet another way to control versions of DLLs used by the application.
> There’re quite a few mechanics they use for that. The oldest one, call a special API function on startup [...]
Isn't the oldest one... to have the API/ABI version in the name of your DLL? Unlike on Linux which by default uses a flat namespace, on the Windows land imports are nearly always identified by a pair of the DLL name and the symbol name (or ordinal). You can even have multiple C runtimes (MSVCR71.DLL, MSVCR80.DLL, etc) linked together but working independently in the same executable.
Linux can do this as well, the issue is that just duplicates how many versions you need to have installed, and it's not that different in the limit from having a container anyway. The symbol versioning means you can just have the latest version of the library and it remains compatible with software built against old versions. (Especially because when you have multiple versions of a library linked into the same process you can wind up with all kinds of tricky behaviour if they aren't kept strictly separated. There's a lot of footguns in Windows around this, especially with the way DLLs work to allow this kind of seperation in the first place).
I did forget to mention something important. Since about Vista, Microsoft tends to replace or supplement C WinAPI with IUnknown based object-oriented ones. Note IUnknown doesn’t necessarily imply COM; for example, Direct3D is not COM: no IDispatch, IPC, registration or type libraries.
IUnknown-based ABIs exposing methods of objects without any symbols exported from DLLs. Virtual method tables are internal implementation details, not public symbols. By testing SDK-defined magic numbers like SDKVersion argument of D3D11CreateDevice factory function, the DLL implementing the factory function may create very different objects for programs built against different versions of Windows SDK.
There’s also API Sets: where DLLs like api-win-blah-1.dll acts as a proxy for another DLL both literally, with forwarder exports, and figuratively, with a system-wide in-memory hashmap between api set and actual DLL.
Iirc this is both for versioning, but also so some software can target windows and Xbox OS’s whilst “importing” the same api-set DLL? Caused me a lot of grief writing a PE dynamic linker once.
I'm having a lot of trouble understanding what you're trying to convey. You say there's a difference from previous "speculation" but also that it's still speculation. Then you go on to write "ALREADY going to" which is future tense (speculation), even clarifying what the speculation is.
So let me explain it more clearly. AI as it is now is already changing the game. It will reduce the demand of swes across every company as an eventuality if we hold technological progress fixed. There is no speculation here. This comes from on the ground evidence from what I see day to day and what I do and my experience pair programming things from scratch with AI.
The speculation is this: if we follow the trendlines of AI improvement for the past decade and a half, the projection of past improvement indicates AI will only get better and better. It’s a reasonable speculation, but it is nonetheless speculative. I wouldn’t bet my life on continuous improvement of AI to the point of AGI but it’s now more than ever before a speculation that is not unrealistic.
It's not about collisions of the hash function itself.
Every hashtable implementation will put the hash value through some sort of modulo because you generally don't want to waste memory storing the key 5726591 at index 5726591 in an array.
So if you know how the implementation works and the hash function is predictable you can keep providing the program with values that will consistently go into the same bucket resulting in linear lookups and insertions.
FWIW you cannot have Unicode-correct rendering by caching at the codepoint (what many people would call “character”) level. You can cache bitmaps for the individual “glyphs”—that is, items in the font’s `glyf` table. But your shaping engine still needs to choose the correct “glyphs” to assemble into the extended grapheme clusters dictated by your Unicode-aware layout engine.
Exactly why I referred to drawing glyphs instead of characters :)
There's even more depth one can go into here: subpixel positioning.
To correctly draw glyphs that may be on subpixel positions, you need to rasterize and cache glyphs separately for each subpixel position (with some limited amount of precision, to balance cache usefulness and accuracy).
However I have a feeling that describing an entire Unicode-aware text stack here may not be useful, especially if TFA seems to only care about simple-script monospace LTR.
Nowadays people expect their terminals to handle UTF-8, or at least the Latin-like subset of Unicode, without dealing with arcana such as codepages. For even the simplest fonts, rendering something like í likely requires drawing multiple glyphs: one for the dotless lowercase I stem, and one for the acute accent. It so happens that dotless lowercase I maps to a codepoint, but it is not generally true that a single extended grapheme cluster can be broken down into constituent codepoints. So even “simple” console output is nowadays complected by the details of Unicode-aware text rendering.
There's a technique known as "NaN boxing" which exploits the fact double precision floats allow you to store almost 52 bits of extra data in what would otherwise be NaNs.
If you assume the top 16 bits of a pointer are unused[1], you can fit a pointer in there. This lets you store a pointer or a full double by-value (and still have tag bits left for other types!).
Last I checked LuaJIT and WebKit both still used this to represent their values.
[1] On amd64 they actually need to be sort of "sign extended" so require some fixup once extracted.
> On amd64 they actually need to be sort of "sign extended" so require some fixup once extracted.
Pointers need to be canonical if LAM/UAI is not enabled. The simplest way to do it is to shift left by 16, then shift arithmetic right by 16. (Or 7 if using 5-level paging). Alternatively, you can store the pointer shifted left by 16 bits, and have the tag in the lower 16 bits, then canonicalizing the pointer is just a single shift-arithmetic-right. If combining with NaN-boxing, then you rotate right to recover the double. (Demo: https://godbolt.org/z/MvvPcq9Ej). This is actually more efficient than messing with the high bits directly.
With LAM/UAI, the requirement is that the 63rd bit matches the 47th (or 56th) bit, which gives 15-bits of tag space on LAM48 and 6-bits of tag space on LAM57.
With LAM enabled, care needs to be taken when doing any pointer comparison, as two pointers which point to the same address may not be equal. There have been multiple exploits with LAM, including speculative execution exploits.
If you restrict yourself to all variants of x86 and ARM, the number of high bits for which I could not find conflicting uses is 6 bits (bits 57-62). The other high bits are reserved in some hardware contexts and therefore may create conflicts.
Using 16 bits may be risky on recent x86. For example, IIRC Linux enables 5-level page tables on microarchitectures that support it, which can put valid address data in bits 48-56.
There is no guarantee that those 6 bits are safe either. They are just the only bits for which I could not find existing or roadmap usage across x86 and ARM sources when I last did a search.
> Using 16 bits may be risky on recent x86. For example, IIRC Linux enables 5-level page tables on microarchitectures that support it, which can put valid address data in bits 48-56.
Linux will not allocate past the 47-bit range, even with 5-level paging enabled, unless specifically requested, by providing a pointer hint to `mmap` with a higher address.
Fails to parse is what it does...
Are we really living in times where people can't write a single (syntactically) well-formed line of code in a programming language they use?
I understand this doesn't really matter when just using NixOS Slop Edition™ but man I hate it.
reply