Similar trend in open text-to-image models: Flux.1 was 12B but now we have 6B models with much better quality. Qwen Image goes from 20B to 7B while merging the edit line and improving quality. Now that the cost of spot H200s at 140GB came down to A100 levels, you can finally try larger scale finetuning/distillation/rl with these models. Very promising direction for open tools and models if the trend continues.
I guess, the sense of accomplishment is very person dependent. I enjoy programming a lot, but it is easy to find people who would challenge themselves to scale the said website to a million users/X view per day. I don't know the why, probably there is no fixed meaning to existence and nature likes diversity.
For me, the fun in programming also depends a lot on the task. Recently, I wanted to have Python configuration classes that can serialize to yaml, but I also wanted to automatically create an ArgumentParser that fills some of the fields. `hydra` from meta does that but I wanted something simpler. I asked an agent for a design but I did not like the convoluted parsing logic it created. I finally designed something by hand by abusing the metadata fields of the dataclass.field calls. It was deeply satisfying to get it to work the way I wanted.
But after that, do I really want to create every config class and fill every field by myself for the several scripts/classes that I planned to use? Once the initial template was there, I was happy to just guide the agent to fill in the boilerplate.
I agree that we should keep the fun in programming/art, but how we do that depends on the what, the who, and the when.
That is likely. Another factor that came into my mind is the gpu using less power due to simpler computations. You can store less data for grayscale, so you need to go over less pixel data to do effects etc. Whether accessibility controls achieve this or not would be implementation dependent I guess.
Even with the best GPU optimizations, most of the data will be processed in full color and then tossed through an extra pass at the end. More likely is that all the data does that.
The bitter lesson here is that if you want to control a business you can not avoid or outsource marketing. It is a huge part of any trade and you have to bear the marketing cost.
I totally understand the desire to avoid it and concentrate on the craft and to create. I tried and failed at it numerous times. I decided that I will not start a business if I do not have any partners who understand and are willing to engage in sales and marketing.
I think this is what is blunted by mass education and most textbooks. We need to discover it again if we want to enjoy our profession with all the signals flowing from social media about all the great things other people are achieving. Staying stupid and hungry really helps.
I think this is more about mechanistic understanding vs fundamental insight kind of situation. The linear algebra picture is currently very mechanistic since it only tells us what the computations are. There are research groups trying to go beyond that but the insight from these efforts are currently very limited.
However, the probabilistic view is very much clearer. You can have many explorable insights, both potentially true and false, by jıst understanding the loss functions, what the model is sampling from, what is the marginal or conditional distributions are and so on. Generative AI models are beautiful at that level. It is truly mind blowing that in 2025, we are able to sample from the megapixel image distributions conditioned on the NLP text prompts.
If you dig ml/vision papers from old, you will see that formulation-wise they actually did, but they lacked the data, compute, and the mechanistic machinery provided by the transformer architecture. The wheels of progress are slow and requires many rotations to finally reach somewhere.
I have deep respect for cuda and Nvidia engineering. However, the arguments above seem to totally ignore Google Search indexing and query software stack. They are the king of distributed software and also hardware that scales. That is way TPUs are a thing now and they can compute with Nvidia where AMD failed. Distributed software is the bread and butter of Google with their multi-decade investment from day zero out of necessity. When you have to update an index of an evolving set of billions of documents daily and do that online while keeping subsecond query capability across the globe, that should teach you a few things about deep software stacks.
That is insightful. Courage to take risks means higher standard deviation in outcomes, more visible successes, but also more hard failures. Risk averse cultures have more stable outcomes, no big successes, but also less financially crippling failures. A personal or social safety net may or may not make you risk averse. Taking semi-calculated risks seems like a skill that needs to be learned for successful entrepreneurship.
The computations in transformers are actually generalized tensor tensor contractions implemented as matrix multiplications. Their efficient implementation in gpu hardware involves many algebraic gems and is a work of art. You can have a taste of the complexity involved in their design in this Youtube video: https://www.youtube.com/live/ufa4pmBOBT8
As a 15+ years emacs user the only item on my wishlist is client-server remote editing mode similar to that of vs code. Then I can go back to using emacs on cloud VMs. Does anyone know a solution to this that works as good as VS Code even when your latency is high? Hopefully, I will be pissed off with all the weird configuration flags of VS Code enough to write one myself ;-) To be fair its python integration is quite good at least for the usual stuff.
1) Run Emacs on your local machine and use Tramp to edit the remote files
2) Run Emacs on the remote machine with the files you're editing. This likely means running in the terminal itself (emacs -nw or requivalently emacs -t).
reply