More

ayhanfuat · 2026-03-01T20:47:37 1772398057

You are absolutely right. The whole post reads like AI generated.

jsheard · 2026-03-01T21:19:12 1772399952

The rate they are posting new articles on random subjects is also a pretty indicative of a content mill.

In 3 days they've covered machine learning, geometry, cryptography, file formats and directory services.

growingswe · 2026-03-02T01:58:46 1772416726

I had to look up what a content mill is. I'm not one, I think. It's "random" stuff because my interests are different. These posts are not written sequentially, I've been working on them (except for this MicroGPT one) for weeks and only publishing now.

gwern · 2026-03-02T06:20:55 1772432455

Dude, you literally start the article with

> Andrej Karpathy wrote a 200-line Python script that trains and runs a GPT from scratch, with no libraries or dependencies, just pure Python.

Almost immediately afterwards, you have a section titled "Numbers, not letters". Need I go on?

Interestingly, despite all the AI tics, the opening passes Pangram as 100% human... though all the following sections I randomly checked also come back as 100% AI. So the simplest explanation would be that you are operating adversarially and you tweaked the opening to target Pangram (perhaps through a anti-AI-detection service, which now exist and are being used by the cutting edge, as Pangram is known to be relatively easy to beat, similar to how people started search-and-replacing em-dashes when that got a little too well known), which unfortunately means I now expect you to lie to me in your response since you apparently went that far to start building up clout.

(BTW, how did you accidentally pick 4 rare names which were in the dataset? "Thanks, will fix" is not a real response to that observation. Are you also going to remove all of the 'just pure X' and 'Y, not X' constructions from your posts now that I've pointed it out?)

UltraSane · 2026-03-02T03:18:41 1772421521

It has gotten to the point you need timestamped keystrokes and a screen recording to prove you actually wrote something yourself.

bonoboTP · 2026-03-02T03:31:13 1772422273

Soon you'll be able to generate a video with AI that shows you typing the entire thing, and will narrate it in your own voice with voice cloning.

re · 2026-03-01T21:11:23 1772399483

I didn't get that sense from the prose; it didn't have the usual LLM hallmarks to me, though I'm not enough of an expert in the space to pick up on inaccuracies/hallucinations.

The "TRAINING" visualization does seem synthetic though, the graph is a bit too "perfect" and it's odd that the generated names don't update for every step.

oytis · 2026-03-01T23:38:31 1772408311

For me it was the prose that alarmed me. Short sentences, aggressive punctuation, desperately trying to keep you engaged. It is totally possible to ask the model to choose a different style - I think that's either the default or corresponds to tastes of the content creators

butterisgood · 2026-03-01T21:01:21 1772398881

ISWYDT

ayhanfuat · 2026-03-01T17:03:04 1772384584

I am surprised r2d3's visual intro is not referenced here (https://r2d3.us/visual-intro-to-machine-learning-part-1/). I think it was the first (if not first, maybe most impactful) example for scroll triggered explainers.

ayhanfuat · 2026-02-28T14:56:55 1772290615

I don't want to downplay the effort here but from my experience you can get yourself a neat interactive summary html with a short prompt and a good model (Opus 4.5+, Codex 5.2+, etc).

jbdamask · 2026-02-28T15:22:15 1772292135

Totally fair, I addressed this in my original post.

earthscienceman · 2026-02-28T15:45:47 1772293547

Can you give am example of the most useful prompting you find for this? I'd like to interact with papers just so I can have my attention held. I struggle to motivate myself to read through something that's difficult to understand

jbdamask · 2026-02-28T16:24:08 1772295848

I replied to a comment above with the system prompt.

Something I've learned is that the standard, "Summarize this paper" doesn't do a great job because summaries are so subjective. But if you tell a frontier LLM, like Opus 4.6, "Turn this paper into an interactive web page highlighting the most important aspects" it does a really good job. There are still issues with over/under weighting the various aspects of a paper but the models are getting better.

What I find fascinating is that LLMs are great at translation so this is an experiment in translating papers into software, albeit very simple software.

ayhanfuat · 2026-02-28T12:02:46 1772280166

Did you type in your email? It seems already filled in because it shows you your email address as the placeholder text but you need to fill in.

itsyonas · 2026-02-28T14:49:28 1772290168

Oops, my mistake. That worked. - Thanks.

ayhanfuat · 2026-02-26T12:27:40 1772108860

I am also getting constant spam because apparently they can see who starred a repo (i.e. I see you starred repo x and we are doing something similar). I am not starring anything anymore.

ayhanfuat · 2026-02-26T11:05:07 1772103907

"Click to toggle optional permissions. Hover the to learn why each is needed."

alejoar · 2026-02-26T11:43:38 1772106218

You are right, I missed that. The text is almost invisible to me though, dark gray over black..

ayhanfuat · 2026-02-23T05:32:19 1771824739

It is written in Rust…

ayhanfuat · 2026-02-18T15:37:20 1771429040

> then b is assigned to a

Wouldn't that require a LOAD_FAST? Also a is assigned first (from left to right) so a = ... happens either way.

ayhanfuat · 2026-02-18T15:07:31 1771427251

> I'm convinced taking notes forces you to properly consider what is being said and you store the information in your brain better that way.

Yes, this is like listening a guided meditation in 2x speed because it is faster.

2026-02-18T15:08:23 1771427303

[dead]

WorldMaker · 2026-02-18T17:58:25 1771437505

Yes, which is why there are so many questions about if we are solving the right problems with such tools.

ayhanfuat · 2026-02-12T15:45:46 1770911146

Indeed. feather was a library to exchange data between R and pandas dataframes. People tend to bash pandas but its creator (Wes McKinney) has changed the data ecosystem for the better with the learnings coming from pandas.

jtbaker · 2026-02-12T17:17:27 1770916647

I know pandas has a lot of technical warts and shortcomings, but I'm grateful for how much it empowered me early in my data/software career, and the API still feels more ergonomic to me due to the years of usage - plus GeoPandas layering on top of it.

Really, prefer DuckDB SQL these days for anything that needs to perform well, and feel like SQL is easier to grok than python code most of the time.

bootsmann · 2026-02-13T12:12:58 1770984778

> Really, prefer DuckDB SQL these days for anything that needs to perform well, and feel like SQL is easier to grok than python code most of the time.

I switched to this as well and its mainly because explorations would need to be translated to SQL for production anyways. If I start with pandas I just need to do all the work twice.

sirfz · 2026-02-13T01:14:58 1770945298

chdb's new DataStore API looks really neat (drop in pandas replacement) and exactly how I envisioned a faster pandas could be without sacrificing its ergonomics

0xcafefood · 2026-02-12T15:51:57 1770911517

Do people bash pandas? If so, it reminds me of Bjarne's quip that the two types of programming languages are the ones people complain about and the ones nobody uses.

hodapp · 2026-02-13T02:21:00 1770949260

The creator of Pandas even bashes it: https://wesmckinney.com/blog/apache-arrow-pandas-internals/

imtringued · 2026-02-13T09:57:47 1770976667

He missed talking about the poor extensibility of pandas. It's missing some pretty obvious primitives to implement your own operators without whipping out slow for loops and appending to lists manually.

fud101 · 2026-02-13T06:23:18 1770963798

have these 'improvements' been backported to pandas now? i would expect it to close the gap over time.

benrutter · 2026-02-13T08:21:25 1770970885

Yes (mostly) is the answer. You can use arrow as a backend, and I think with v3 (recently released) it's the default.

The harder thing to overcome is that pandas has historically had a pretty "say yes to things" culture. That's probably a huge part of its success, but it means there are now about 5 ways to add a column to a dataframe.

Adding support for arrow is a really big achievement, but shrinking an oversized api is even more ambitious.

postexitus · 2026-02-12T16:00:37 1770912037

polars people do - although I wouldn't call polars something that nobody uses.

ayhanfuat · 2026-02-12T16:06:03 1770912363

I also use polars in new projects. I think Wes McKinney also uses it. If I remember correctly I saw him commenting on some polars memory related issues on GitHub. But a good chunk of polars' success can be attributed to Arrow which McKinney co-created. All the gripes people have with pandas, he had them too and built something powerful to overcome those.

mistrial9 · 2026-02-12T16:22:09 1770913329

I saw Wes speak in the early days of Pandas, in Berkeley. He solved problems that others just worked around for decades. His solutions are quirky but the work was very solid. His career advanced a lot IMHO for substantial reasons.. Wes personally marched through swamps and reached the other side.. others complain and do what they always have done.. I personally agree with the criticisms of the syntax, but Pandas is real and it was not easy to build it.

rlh2 · 2026-02-13T11:45:58 1770983158

People also love to hate R but data.table is light years better than pandas in my view