I had to look up what a content mill is. I'm not one, I think. It's "random" stuff because my interests are different. These posts are not written sequentially, I've been working on them (except for this MicroGPT one) for weeks and only publishing now.
> Andrej Karpathy wrote a 200-line Python script that trains and runs a GPT from scratch, with no libraries or dependencies, just pure Python.
Almost immediately afterwards, you have a section titled "Numbers, not letters". Need I go on?
Interestingly, despite all the AI tics, the opening passes Pangram as 100% human... though all the following sections I randomly checked also come back as 100% AI. So the simplest explanation would be that you are operating adversarially and you tweaked the opening to target Pangram (perhaps through a anti-AI-detection service, which now exist and are being used by the cutting edge, as Pangram is known to be relatively easy to beat, similar to how people started search-and-replacing em-dashes when that got a little too well known), which unfortunately means I now expect you to lie to me in your response since you apparently went that far to start building up clout.
(BTW, how did you accidentally pick 4 rare names which were in the dataset? "Thanks, will fix" is not a real response to that observation. Are you also going to remove all of the 'just pure X' and 'Y, not X' constructions from your posts now that I've pointed it out?)
I didn't get that sense from the prose; it didn't have the usual LLM hallmarks to me, though I'm not enough of an expert in the space to pick up on inaccuracies/hallucinations.
The "TRAINING" visualization does seem synthetic though, the graph is a bit too "perfect" and it's odd that the generated names don't update for every step.
For me it was the prose that alarmed me. Short sentences, aggressive punctuation, desperately trying to keep you engaged. It is totally possible to ask the model to choose a different style - I think that's either the default or corresponds to tastes of the content creators
I don't want to downplay the effort here but from my experience you can get yourself a neat interactive summary html with a short prompt and a good model (Opus 4.5+, Codex 5.2+, etc).
Can you give am example of the most useful prompting you find for this? I'd like to interact with papers just so I can have my attention held. I struggle to motivate myself to read through something that's difficult to understand
I replied to a comment above with the system prompt.
Something I've learned is that the standard, "Summarize this paper" doesn't do a great job because summaries are so subjective. But if you tell a frontier LLM, like Opus 4.6, "Turn this paper into an interactive web page highlighting the most important aspects" it does a really good job. There are still issues with over/under weighting the various aspects of a paper but the models are getting better.
What I find fascinating is that LLMs are great at translation so this is an experiment in translating papers into software, albeit very simple software.
I am also getting constant spam because apparently they can see who starred a repo (i.e. I see you starred repo x and we are doing something similar). I am not starring anything anymore.
Indeed. feather was a library to exchange data between R and pandas dataframes. People tend to bash pandas but its creator (Wes McKinney) has changed the data ecosystem for the better with the learnings coming from pandas.
I know pandas has a lot of technical warts and shortcomings, but I'm grateful for how much it empowered me early in my data/software career, and the API still feels more ergonomic to me due to the years of usage - plus GeoPandas layering on top of it.
Really, prefer DuckDB SQL these days for anything that needs to perform well, and feel like SQL is easier to grok than python code most of the time.
> Really, prefer DuckDB SQL these days for anything that needs to perform well, and feel like SQL is easier to grok than python code most of the time.
I switched to this as well and its mainly because explorations would need to be translated to SQL for production anyways. If I start with pandas I just need to do all the work twice.
chdb's new DataStore API looks really neat (drop in pandas replacement) and exactly how I envisioned a faster pandas could be without sacrificing its ergonomics
Do people bash pandas? If so, it reminds me of Bjarne's quip that the two types of programming languages are the ones people complain about and the ones nobody uses.
He missed talking about the poor extensibility of pandas. It's missing some pretty obvious primitives to implement your own operators without whipping out slow for loops and appending to lists manually.
Yes (mostly) is the answer. You can use arrow as a backend, and I think with v3 (recently released) it's the default.
The harder thing to overcome is that pandas has historically had a pretty "say yes to things" culture. That's probably a huge part of its success, but it means there are now about 5 ways to add a column to a dataframe.
Adding support for arrow is a really big achievement, but shrinking an oversized api is even more ambitious.
I also use polars in new projects. I think Wes McKinney also uses it. If I remember correctly I saw him commenting on some polars memory related issues on GitHub. But a good chunk of polars' success can be attributed to Arrow which McKinney co-created. All the gripes people have with pandas, he had them too and built something powerful to overcome those.
I saw Wes speak in the early days of Pandas, in Berkeley. He solved problems that others just worked around for decades. His solutions are quirky but the work was very solid. His career advanced a lot IMHO for substantial reasons.. Wes personally marched through swamps and reached the other side.. others complain and do what they always have done.. I personally agree with the criticisms of the syntax, but Pandas is real and it was not easy to build it.
reply