For those of you interested in browser-based search with SQLite WASM, I’ve written a small post on the Craft of Emacs search page.You can check out the actual search here:
https://craft-of-emacs.kebab-ca.se/search.html
Kudos to the SQLite team. It was a joy to implement.
Are there any gotchas around the sql parameterization? Looks like you're passing in an array into db.exec. I would have thought that would be enough, but if I search for org-mode then I see an error on the console complaining about no such column: mode.
Could you comment on which "flavor" of SQLite for WASM this is using and how you built or pulled the glue code? I think sql.js and absurd-sql are the best known solutions for this but it's not clear from the blog if you're using those or rolled your own. Information on how it was built or what prebuilt you're using would be fantastic.
I also see that load-db.js is loading the known search index values into localStorage ... do you have some tooling for creating this file from a known SQLite base file or was it just handrolled from a localStorage already holding a valid DB?
EDIT: Looking in the code I found this
const db = new sqlite3.oo1.DB({
filename: 'local',
// The 't' flag enables tracing.
flags: 'r',
vfs: 'kvvfs'
});
Googling lead me to the "official" SQLite WASM pages which otherwise don't appear too prominently in search results for whatever reason. This (https://sqlite.org/wasm/doc/trunk/about.md) seems like a good starting point and notes that both sql.js and absurd-sql are in fact inspirations
Of course! As you’ve found through digging into the js, it uses the official SQLite WASM build. The site currently uses a slightly old vendored version built from the SQLite codebase, but the SQLite team have recently released a binary at https://sqlite.org/download.html (under the WebAssembly section) that anyone can try out. You can find the docs at https://sqlite.org/wasm/doc/trunk/index.md, which have been rapidly improving over the past few months.
The process for building the database is a bit complex. I want to support all browsers, so unfortunately need to use local storage to back it up. Firefox has a while to go before it supports the Origin Private File System, but once it does so the build will be a lot smoother.
I build the index as part of the site’s CI (using nix), by running SQLite-WASM in deno to pre-load local storage. I then extract the keys from local storage and populate them as part of the site load using the hand-rolled load-db.js file.
SQLite WASM does have better support for importing / exporting databases on OPFS, so this process should be simpler as soon as I can move to it.
I’ll write a follow up post at some point on the implementation details.
Thanks for the info that's really helpful. I was immediately curious whether SQLite-WASM would run in Deno and so I'm glad to hear you say that's something you've tried. It felt immediately like it "should" since Deno mirrors web APIs so well and supports a WASM runtime but I wasn't sure.
I have a current project that is starting out by building a client-side SQLite DB (currently in Tauri but WASM SQLite would be an option as well) that would benefit from an option to offload building really large DBs so a shared server process. Knowing same or very similar code could run in Deno opens up some really interesting possibilities.
All this to rebuild the wonderful WebSQL that Microsoft and Mozilla conspired to kill and replace with the grossly inferior IndexedDB (just see what grotesque contortions are required to do something as simple as SELECT COUNT(*) FROM some_table; to see what I mean).
WASM is to the "Origin Privet File System" (OPFS) is to WebGPU. It's about making lower level hardware primitives (CPU, HDD, GPU) available in a sandboxed form to the web platform. There are many database engines, and even extensions for SQLite, that are domain specific. WebSQL was the wrong abstraction and would have failed to fulfilled all use cases.
The correct abstraction is WASM and the now no longer missing piece, the OPFS, which allows you to build any ACID compliment database engines you want.
Take for example DuckDB, without WASM and the OPFS that wouldn't be possible. We are going to see an explosion in "local first" development as a result of these new primaries build on top of many different database engines.
WebSQL was a proposed web standard that all standards-compliant browsers would have to implement. The standard was tightly coupled to the SQLite implementation, which was a problem for any browser that could not just bundle SQLite. Ultimately the standard was rejected because it was too coupled to a single implementation.
Maybe as a standard it was lacking... but iirc Firefox, Safari and Chrome all had support, and IE/MS was the one holdout. I get that it was too "loosely" defined as a Standard.
I also think the UX for WebSQL isn't great and a modern async/promises based interface would have been better (even if queries themselves non-blocking and serialized on another thread). Combined with the File System Access API, this could be really useful though.
Currently toying with a Rust/Tauri project, and debating on using the SQLite plugin to do the data access in the UI, or in the rust side, then serialize the requests across more manually. Since I'm dealing with other services, will have to do a lot of that anyway.
Also using Tauri in SQLite for a project. I only briefly looked at the provided SQLite plugin and quickly decided it didn't support everything I needed (custom scalar functions for example but there were others) but as far as I know all "official" Tauri plugins use the same event/command RPC mechanism available to you in userland for calling into Rust so I don't actually think the Tauri SQLite plugin "does the access in the UI" in the truest sense -- otherwise it wouldn't be a Tauri plugin it would just be a vanilla JS lib.
If you've looked more closely and know that not to be the case would like to hear what you've seen but my understanding is that anything that leaves the WebView sandbox is using RPC to make calls to Rust.
I'd be surprised if it was anything other than what I can do myself via the existing event/command system... I just meant that controlling the queries themselves on the JS side vs. the Rust side... kind of torn between something similar to a regular web api, abstracted over events, etc. Honestly, I should just start doing anything at this point... started over this past weekend, and got some hello world bits pretty quickly and been mostly toying since.
> The standard was tightly coupled to the SQLite implementation…
I don't understand why this is a problem. It's in literally everything but my browsers, is 1/18th the size of Apple's home page (500 KB, while Chrome is a 1GB binary on my machine), and is tested far better than any browser.
> …which was a problem for any browser that could not just bundle SQLite.
What's a scenario where a browser can't just bundle SQLite?
I'd love to switch direct to this, but it needs Firefox to support OPFS or for some form of IndexedDB workaround akin to absurd-sql to be upstreamed for my use case where I'd like storage to be persistent.
At least Firefox OPFS is under development and looking at the tracking ticket seems to have accelerated in January
That's exciting about Firefox, though I really wish that firefox would support OPFS in such a way as to allow selecting the directory where the files are located, beyond just persistence, there are some use cases where that is required (vscode.dev for example).
Unfortunately, the whole point of OPFS is to not do that. Mozilla and Apple thought it'd be bad for security to let you read/write arbitrary files/folders on the real file system, even with a permissions dialog, so a fake filesystem only viewable by an individual website was the compromise solution that everyone agreed to implement.
The article asserts that javascript is interpreted and was is bytecompiled so its faster.
I think this ignores modern JIT based javascript engines like v8. They're very good at optimizing hot path javascript code as native code. WASM should be easier to optimize give the lack of variability and permissiveness in calls. BUt as far as I know, this hasn't been done yet for WASM so it runs more slowly than it could.
I love this writeup! I'm not a developer but am very interested in "small databases" on the web - there was a good discussion around my post on this on HN last week. The magic of small databases: https://news.ycombinator.com/item?id=34558054
Plus the data itself (encoded in load-db.js) at another 101KB.
Though given the first couple of links on the current front page that are local (Ask HN) so small because this site is tight, video content, etc, have payloads of 4.2MByte (a Washington Post article), 2.2MByte (something on the Economist), 4.5MByte (Wired), 4.7MByte (the-odin), ..., if that 780K+101K is doing something genuinely useful it perhaps isn't that big compared to the bad standard set elsewhere!
For a serious use, storing the DB engine and data in local resources rather than reloading each time which the devtools network profiler suggests is happening, would be a good idea (preferably only transferring a diff when there is an update).
A 700kb dependency isn't too bad in a modern browser + connection. Depending on how much else is loaded and how it is loaded.
Developers often have an out-of-date perspective on what is big these days and the realistic performance impacts. Especially with HTTP2 and multiple assets loading async.
The upfront network cost will always be larger in comparison to independent queries, however the benefit will be less total network use after some threshold of searches due to eliminating the overhead of independent searches over the network. The other benefit is vastly improved latency of all searches after the initial page load.
Whether or not this makes sense is entirely subjective, it depends on the DB size, frequency of changes to affected data and the expected user behaviour... i.e is it extremely likely they will be making multiple searches.
I've not tried SQLite in the browser yet, but have used effectively the same solution for an internal tool by downloading all of the metadata needed for any possible queries from an SQL database. This worked very well under a certain size, making lightning fast, low latency local searches in the browser... but did not scale due to the particulars of the DB and the user case... eventually the initial metadata payload was not tolerable as the DB grew and the changes were frequent, requiring a new payload on every load of the page (unlike the authors use case), and so I switched to backend queries, the latency for individual queries is worse, but the total experience is better since the initial load is the same.
A single high-quality image on, say, a vacation rental website could easily be this large, to say nothing of a gallery of images for just a single property. Unless you're explicitly designing for 4G mobile connectivity, a properly-cached and CDN-distributed 1MB payload is absolutely viable.
My interest in it isn't with websites. Redpanda supports using wasm (https://redpanda.com/blog/wasm-architecture) so that with Sqlite can make for very fast streams enrichment. Similar for using DuckDB depending on your data.
I have seen this technique (send all the indexed search terms) a few times in the wild. Racket (a scheme derivative specifically made for teaching) has a search page where everything public is indexed [1] (see plt-index.js) and it works really well.
Looks to be using the kvvfs which uses local storage key value pairs as the DB VFS. The file load-db.js (https://craft-of-emacs.kebab-ca.se/load-db.js) pre-loads the values into localStorage. I assume the values were determined by loading local storage through another method and then extracting it during development.
Kudos to the SQLite team. It was a joy to implement.