Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

use AI to rewrite all the spells from all the books, then try to see if AI can detect the rewritten ones. This will ensure it's not pulling from it's trained data set.


Neat idea, but why should I use AI for a find and replace?

It feels like shooting a fly with a bazooka


it's like hiring someone to come pick up your trash from your house and put it on the curb.

it's fine if you're disabled


Bazooka guarantees the hit


I like LLMs, but guarantees in LLMs are... you know... not guaranteed ;)


I think that was the point


If all you have is a hammer.. ;)


do you know all the spells you're looking for from memory?


You could just, you know, Google the list.


and then the first thing you see will be at least one of ITS AI responses, whether you liked it or not


You're missing the point, it's only a testing excersize for the new model.


No, the point is that you can set up the testing exercise without using an LLM to do a simple find and replace.


Its a test. Like all tests, its more or less synthetic and focused on specific expected behavior. I am pretty far from llms now but this seems like a very good test to see how geniune this behavior actually is (or repeat it 10x with some scramble for going deeper).


This thread is about the find-and-replace, not the evaluation. Gambling on whether the first AI replaces the right spells just so the second one can try finding them is unnecessary when find-and-replace is faster, easier and works 100%.


... I'm not sure if you're trolling or if you missed the point again. The point is to test the contextual ability and correctness of the LLMs ability's to perform actions that would be hopefully guaranteed to not be in the training data.

It has nothing to do about the performance of the string replacement.

The initial "Find" is to see how well it performs actually find all the "spells" in this case, then to replace them. They using a separate context maybe, evaluate if the results are the same or are they skewed in favour of training data.


That won't help. The AI replacing them will probably miss the same ones as the AI finding them.


I think the question was if it will still find 49 out of 50 if they have been replaced.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: