Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

>appears slowly as models scale up?"

Both, I think, based on limited tinkering with smaller models.

I've been using GPT4ALL and oobabooga to make testing models easier on my single (entry-level discrete GPU) machine. Using GGML versions of llama models, I get drastically different results.

With a 7B parameter model I mostly-- not always-- get an on topic and somewhat coherent response. By which I mean, if I start off with "Are you ready to answer questions?" it will say "Yes and blah blah blah..." for a paragraph about something random. On a specific task it will perform a bit better: my benchmark request has been to ask for a haiku. It was confused, classified haikus as a form of gift, but when pushed it would output something resembling a poem but not a haiku.

Then I try a 13B model. It's a lot better at answering a simple question like "are you ready?" but will still sometimes say yes and then give a random dialogue as if it's creating a story where someone asks it to do something. It will readily create a poem on first attempts, though still not a haiku in any way. If I go through about a dozen rounds of asking it what a haiku is and then, in subsequent responses, "reminding it" to stay on course for those definitions, it will kind of get it and give me 4 or 5 short lines.

A 30B model answers simple questions and follow simply instructions fairly easily. It will produce something resembling a haiku, though often with an extra line and a few extra syllables, with minimal prodding an prompt engineering.

None of the above, at least the versions I've tried (variations & advances are coming daily) have a very good memory. The clearly have some knowledge of past context but mostly ignore it when it comes to keeping responses logically consistent across multiple prompts. I can ask it "what was my first prompt?" and get a correct response, But when I tell it to respond as if it's name is "Bob" then a few prompts later it's calling me Bob and back to labelling itself an AI assistant.

Then there's the 65B parameter model. I think this is a big leap. I'm not sure though, my PC can barely run the 30B model and gets maybe 1 token every 3 seconds on 30B. The 65B model I have to let use disk swap space or it won't work at all, and it produces roughly 1 token per 2-3 minutes. It's also much more verbose, reiterating my request and agreeing to it before it proceeds, so that adds a lot of time. However, a simple insistence on a "Yes/No" answer will succeed. A request for a Haiku succeeds on the first try, with nearly the correct syllable count too, using an extra few syllables in trying to produce something on the topic I specify. This is commensurate with what I get with normal ChatGPT, which has > 150B parameters that aren't even quantized.

However I have yet to explore the 65B parameter model in any detail. 1 token every 2-3 minutes, sucking up all system resources, makes things very slow going, so I haven't done much more than what I described.

Apart from these, I was just playing around with the 13B model a few hours ago and it did do a very decent job at producing basic SQL code I asked it to produce against a single table. Max value for the table, max value per a specified dimension, etc. It did this across multiple prompts without much need to "remind" it about anything a few prompts earlier. At that point though I was all LLM burned out for the day (I'd been fiddling for hours) so I didn't get around to asking it for simple joins.

So in short, where I began, I think its both. Abilities are somewhat task specific, as are the quality improvements for a given task across larger parameter models. Sometimes a specific task has moderate or little improvement at higher levels, sometimes another task does much better, or only does much better when it reaches a certain point: e.g., haikus from 13B to 30B weren't a great improvement, but 30B to 65B was an enormous improvement.*



try wizard-vicuna-13b in ooba. I think you will be pleasantly surprised.


Oh yeah, I have to check that out. I did a little with the 7B model (I think the description said Microsoft was involved in its creation? Which is interesting, shows they’re hedging their Bacha little bit away from open AI in towards true open source options as well) anyway the 7B Wizard 7B model was noticeably better that the other I tried, though also noticeably worse than the 13B “snoozy” model.

I’m trying to see if I can justify getting a new workstation-tower PC. My home computer and work laptop are beefy for my usual work, but not enough to


hmmm, I'm having trouble finding a ggml model version... my google-fu force is not with me today. Do you know of a source?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: