I've been doing this with chatGPT directly. But it's cutoff data is 2021. It's pretty good though at telling you even the research name so you can validate.
As someone who is starting a master's in science and will be looking at lots of research papers, I've been wondering what the best use of this could be.
If I have my own PDFs, I guess I could get ChatGPT to create summaries in some structured way, perhaps in a single file with citation:summary, and then send up that file with every question I ask?
Some article today (in spiegel) suggested these tools for pdf upload and various levels of support when extracting information from papers and writing new papers:
> jenni ai seems interesting for doing research/thesis style work.
“Join the Jenni influencer program and earn money for your posts. Earn up to $5,000 per post. We'll send you $1 for every 1,000 views your post receives. Minimum payout $20 (20k views).” — https://jenni.ai/influencer-program
Sounds very interesting to say it sounds interesting.
> If I have my own PDFs, I guess I could get ChatGPT to create summaries in some structured way, perhaps in a single file with citation:summary, and then send up that file with every question I ask?
Extract the text, put it into a database (a vectordb would be the hot thing for with an LLM, is probably ideal, but ChatGPT does a pretty good job using Wikipedia as a “database” with a dirt simple ReAct pattern implementation, so you probably don’t need to be that fancy to get value out of it) and then use tooling to let ChatGPT use that database for questions.
FWIW is is pretty common to read the abstract, skim the intro, skip to the conclusion/results, and then only then read the rest of the paper (if you even need to at that point).
Actually, what I was asking about was how to handle my storage and later retrieval of hundreds of papers. I don't mean just the database tool -- I currently use Zotero and make sure articles are tagged -- but I'm wondering if I can build a system for my articles where I can ask ChatGPT "are there any studies that show X causes Y" and have it use my articles directly, and not give me bullshit citations like it does 90% of the time now.
(To be clear, I'm pleasantly surprised when it does give me some real, useful citations of articles I didn't know about, but I'd like to get it's accuracy way up, at least for articles I have in my possession that I could feed it for context.)
First you just extract all of the text from the PDFs. It's helpful to use arxiv instead, since it gives you latex. Then you can parse to separate content from style. Then store that in a a DuckDB database, with zstd compression since it's great on text.
Then, just use some encoder model (decoder models are all the rage but they're not necessary here) to process all of these texts into Qdrant.
Then use your Qdrant database happily ever after, with whatever you like, such as Vicuna or Guanaco 30b GPTQ. Sprinkle your langcgain or whatever you like. Galpaca 30b may be appropriate for science text.
As part of my testing of https://flowch.ai (which we are developing), I've been uploading multiple PDFs (including scientific papers). I then ask a question and ask it to cite its sources as part of the prompt and it'll give the PDFs it's using for the data / quotes it's using. Works well!
Elicit is very useful! It can't sometimes answer some very specific and complex questions, but it also doesn't tend to hallucinate anywhere as much as chatgpt. Rather it will be quite upfront about when it can't really find an answer to something.
For my query "Can a LLM be used to build an agent?", elicit gave me highly relevant results, whereas Consensus could not find any relevant or recent results. I'm guessing the difference is that Consensus does not have papers from arxiv.
I actually think it would make sense for Consensus to try to highlight how they are different than e.g. Elicit, since they already exist on the market since quite some time.
The age of the observable universe is 11 billion years, which is in close agreement with the inflationary model prediction that the age of the universe is two-thirds of the Hubble time.
In contrast, when I ask Google "how old is the universe" I get as top result:
26.7 billion years old
Current estimates place the Big Bang 13.8 billion years ago. University of Ottawa adjunct professor Rajendra Gupta has calculated that it is, in fact, 26.7 billion years old – nearly twice as old as the current accepted model. extracted from https://cosmosmagazine.com/space/astrophysics/universe-27-bi....
I'm not super familiar with them, but taking a quick look it looks like the have a lot of papers published by the Allen Institute in Seattle[0] (which I am much more familiar with). So I would say they at least carry very reputable papers.
Here[1] is a summary of Semantic Scholar Database in the National Library of Medicine (note also from an Allen Institute spinoff, but it just made it easy to search for something I know as reputable).
Yes, I am a subscriber. They have little icons like “rigorous journal”, “systemic review”, “meta-analysis” and (if you give this any weight) “highly cited”. I believe you can filter by those in your search as well.
Embed articles and throw the results in a vector database.
Throw up a search result that just uses cosine similarity on the vector search with questionable metrics and no explanations on how things are calculated.
Charge yearly because you know people will churn after a month or two.
I'll play DA here - there's quite a bit of engineering surrounding these apps that can appear hidden to folks from the outside looking in. Various levels of prompt engineering and in-context learning might be necessary to get optimal results, and this could mean significantly more complexity at the application level.
Every time I hear or read "prompt engineering", I can't help but cringe a bit. I'm not sure why, but it's the same reaction I would have if I heard someone say "Google search query engineering".
Comparing to google search, there definitely is skill involved in knowing how to google things well. We're all accustomed to googling things many times per day so I think a lot of people forget that being able to google things and get the results you want is a skill that has to be learned.
But I would never refer to being "good at writing google search queries" as any kind of engineering. But is becoming good at searching google any less difficult than getting good at writing LLM prompts?
I'd love to hear the other side of the argument. How difficult is it to become good at "prompt engineering"? Why do we even call it "prompt engineering" instead of just "writing effective prompts"?
Edit: I think the main gripe I have with the term "prompt engineering" is it makes the skill of writing prompts sound a lot harder than it actually is. Maybe I'm underestimating how difficult it is to learn how to write good prompts?
IMO you're right that "prompt engineering" is a cringe-y term because it implies what you're doing is mostly writing prompts. That being said, I don't think that's what it actually entails, any more than "backend engineering" is mostly writing SQL queries. Prompt engineering is building the systems around the prompts e.g. writing the LangChain or whatever code, parsing stuff and interacting with structured DBs, message queues, etc (and occasionally writing prompts too, but that's a relatively smaller part). It requires some domain-specific knowledge e.g. chain of thought and retrieval augmented generation techniques, some basic linear algebra, keeping up to date with new models and new ways of running them (ggml? gptq? openai functions vs logit masking llama 2?), but it's more or less backend engineering with a twist.
I've seen some of the more serious people switch to the term "Applied AI" which I think encompasses the role a lot better. Also I've seen a decent number of grifter types saying they're "prompt engineers" when what they mean is they're writing prompts into ChatGPT's UI, which I think is part of what drives the cringing feeling when you hear the phrase "prompt engineer" and probably drives some of the movement away from the term for engineers.
This is a great recap of the present role. I have been following a conversation around the AI Engineer as a label (although I like Applied AI better) from the Latent Space podcast team so a fair amount of meme hypecycle, but also active and actionable. They are setting up a conference in October and you can see the discussion unfold around the blog post below on X-itter.
They also have a great Slack community and are rapidly turning out features. Everything I have seen suggests that they are competent and committed to the mission of making it easier to do good science.
It is easy to be cynical about the gold rush, but don’t throw the babies out with the bathwater.
I asked it about UBI and I guess there isn't enough data yet, because it called it "fiscally unbearable and morally unacceptable" (which seems to be what only 1 uncited source said)! At least it admitted consensus was low...
Summary
Top 10 papers analyzed
Some studies suggest that universal basic income (UBI) can generate support for structural reforms and improve mental health, while other studies argue it may be fiscally unbearable, morally unacceptable, and increase wealth inequality.
Topics are "highly political" when there is insufficient evidence for them. Which is why heliocentrism is not "highly political" (and also why it once was).
It just occurred to me recently that AI could and should be set up to replace peer reviews of scientific papers. (See the book "Science Fictions"). I asked ChatGPT v3.5 what it would take to do exactly that and what the algorithm would look like. I was very impressed with the response.
Looking further down the road, if we connect AI to reality to any significant degree and train it to be completely objective, the powers-that-be will ban it. After all, they have their narratives to push and their propaganda to spew. I no longer ask myself why the conventional wisdom is so often wrong.
I felt really excited clicking on some of the suggested prompts, but that excitement quickly fell apart when trying to generate a summary for what I consider to be a fairly simple custom search:
impact of airbnb listings on house prices → can't summarise, need to post it as a question.
how do airbnb listings impact house prices? → can summarise, can't create a concensus, must use a yes/no question.
do more airbnb listings increase house prices? → can summarise, can't create a concensus because there's not enough relevant search results. But the maximum is 20 (according to the info icon) and it found 11 highly relevant articles, so I really don't understand how there isn't enough relevant search results.
Same. Yet another "AI product" that has zero product-market fit and zero usefulness. What sucks is that even if you do get a consensus, it's not even tractable (e.g. does not properly cite sources), so where could I possibly even use the conclusion drawn?