Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> But it fails at simple arithmetic.

Does it though? When allowing LLMs to use their outputs as a form of state they can very much succeed up to 14 digits with > 99.9% accuracy, and it goes up to 18 without deteriorating significantly [1].

That really isn't a good argument because you are asking it to do one-shot something that 99.999% of humans can't.

https://arxiv.org/abs/2211.09066



Try asking it to combine some simple formulas involving unit conversions. It does not do math. You can ask it questions that let it complete patterns more easily.


It does not have to do math in one shot, and neither can humans. The model needs only to decompose the problem to subcomponents and solve those. If it can do so recursively via the agents approach then by all means it can do it.

The cited paper covers this to some extend. Instead of asking the LLMs to do multiplication of large integers directly, they ask the LLM to break the task into 3-digit numbers, do the multiplications, add the carries, and then sum everything up. It does quite well.


What do you mean one-shot? Hasn't ChatGPT been trained on hundreds of maths textbooks?


When I ask a human to do 13 digit addition, 99.999% of them will do the addition in steps, and almost nobody will immediately blurt out an answer that is also correct without doing intermediate steps in their head. Addition requires carries, and we start from least to most significant and calculate with the carries. That is what 1-shot refers to.

If allow LLMs to do the same instead of producing the output in a single textual response, then they will do just fine according to the cited paper.

Average humans can do multiplication in 1 step for small numbers because they have memorized the tables. So can LLMs. Humans need multiple steps for addition, and so do LLMs.


Ok. In the context of AI, 1-shot generally means that the system was trained only on 1 example (or few examples).

Regarding of the number of steps it takes an LLM to get the right answer: isn't it more important that it gets the right answer, since LLMs are faster than humans anyway?


I am well aware what it means, and I used 1-shot for the same reason we humans say I gave it "a shot", meaning attempt.

LLMs get the right answer and do so faster than humans. The only real limitation here is the back and forth because of the chat interface and implementation. Ultimately, it all boils down to giving prompts that achieve the same thing as shown in the paper.

Furthermore, this is a weird boundary/goal-post humans get stuff wrong all the time, and we created tools to make our lives easier, if we let LLMs use tools, they do even better.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: