calum-bird's comments

calum-bird · on Jan 18, 2025

Do you have any resources you can share on managing the enclave’s public key? We found documentation to be sparse.

calum-bird · on June 20, 2024

We (disclosure: founder) do something similar at Trelent[1] but with an emphasis on security. Paid accounts can use OpenAI & Anthropic models, free ones just OpenAI. We have 3.5 sonnet live already. If you want to try it out lmk! Also totally respect building your own open-source :)

[1]: https://trelent.com

namanyayg · on June 21, 2024

wow Trelent looks cool, how does ZDR negotiation work exactly? What do you offer to the provider that allows you ZDR?

calum-bird · on June 21, 2024

So typically these providers only offer ZDR to "managed" customers, after a lengthy application process. For example, on Azure, "managed" means companies with >$1m, possibly more now, in annual spend. They don't want to waste their time going through this long application process with smaller companies, so we take some of that weight off their shoulders. They get the same revenue at the end of the day, so in many ways it groups smaller companies' LLM spend and sends it straight to their bottom line, and they still get to claim their rolling out AI "responsibly".

Once one provider is cracked, the others fall as well, as these AI companies are all competing viciously for customers. Et voila, ZDR across multiple providers for the small(er) companies out there :)

calum-bird · on March 27, 2024

For efficiency, 132B.

That way, at inference-time you get the speed of 36B params because you are only "using" 36B params at a time, but the next token might (and frequently does) need a different set of experts than the one before it. If that new set of experts is already loaded (ie you preloaded them into GPU VRAM with the full 132B params), there's no overhead, and you just keep running at 36B speed irrespective of the loaded experts.

You could theoretically load in 36B at a time, but you would be severely bottlenecked by having to reload those 36B params, potentially for every new token! Even on top of the line consumer GPUs that would slow you down to ~seconds per token instead of tokens per second :)

calum-bird · on Aug 1, 2023

Trelent | Founding Engineer | REMOTE (US/Canada preferable) | $100k+

I am the founder of Trelent. We’re building a useful agent for RPA, with a distinctly structured & hierarchical approach to working with LLMs. We have pilot customers in several industry verticals, have thousands of users, and are venture backed by multiple funds. Our goal is to ultimately enable the next billion software developers.

We’re very early (team of two) and are looking for a founding engineer who can wear several hats when necessary, but has a knack for UX. The focus right now is largely product-side, mainly on the front-end. Our stack is NextJS, Tailwind, and Node. Bonus points if you've worked with LLMs before - we strongly believe that they will change UX (but not through a chatbox alone).

Preference to North America, but fully remote and can hire anywhere for the right candidate. More details & application at https://jobs.trelent.com

calum-bird · on June 22, 2022

Copilot does inline completion right now - they implemented it into copilot as a pilot program[0] before OAI went live with that new model IIRC.

[0]: https://openai.com/blog/gpt-3-edit-insert/

calum-bird · on March 14, 2022

Our website has a little more @ www.trelent.net. Working on a better way to showcase more examples. Will be inherently bias as we're selecting them (even if we try to take a fair sample of good/bad results), so best would be for you to try it yourself haha

calum-bird · on March 14, 2022

Totally get that. Working on locally-hosted version for enterprise use-case with this in mind. We are either paid in dollars or data, and I'd rather be transparent about that then hide it deep in a Privacy Policy.

calum-bird · on March 14, 2022

Please note: this does make use of a remote server, so use with caution. Working on locally-hosted version for anyone with ~48GB of VRAM (and change) to spare.

syspec · on March 14, 2022

Wouldn't that much VRAM just be needed for training the model? Not inference?

calum-bird · on March 14, 2022

Haha I wish! That's just loading the weights for inference :)

syspec · on March 15, 2022

I sense a humblebrag

calum-bird · on March 16, 2022

Haha indeed, NLP is fun!