I’ve created a chatbot (here) using a large language model (LLM) – aka an ‘AI’ – to summarize research on myalgic encephalomyelitis/chronic fatigue syndrome (ME/CFS). My hope is that it will help people with ME (pwme) and their caretakers to understand the medical literature. Here I will explain how it’s set up, with the most-general information first.
The ME/CFS chatbot will provide answers that are different from those supplied by general LLMs like ChatGPT. This one will consult only abstracts in NIH’s comprehensive PubMed database that mention “myalgic encephalomyelitis” or “chronic fatigue syndrome,” so it focuses solely on summarizing published research. General chatbots include the same information, but it’s diluted among the innumerable blogs, social media posts, and news reports on the same topic. Also, this chatbot’s design ‘under the hood’ should make it less likely than general LLMs to produce ‘hallucinations’ – or invented answers. In short, it should provide the most-reliable answers about patterns in ME/CFS research.
But LLMs can work as designed and still provide low-quality information. This is because their answers are swayed by volume more than by quality. So, if 100 articles are incorrect about a topic and only one article is correct, it’s likely that the chatbot will repeat the incorrect information. Caveat lector!
The chatbot sees only the title, date, and abstract of each research article about ME/CFS. The title and date are included to help users find the original article and to allow them to filter by date. I have several reasons for not including the full articles, including issues of copyright, varying length leading to varying importance to the LLM, and the enormous time and computing resources this would require. Anyone who wants to see more metadata (author, journal) can match the titles given by the chatbot to the full database of ~7000 abstracts here.
For the chatbot, this collection of abstracts has been converted behind the scenes into a vector database using chroma. In a vector database, the text in each abstract is converted into a large set of numbers that encode its meaning. Pieces of text with similar meanings, such as ‘I am tired’ and ‘I feel fatigued’ have similar sets of numbers. So a search in a vector database can find not only matching terms but synonyms.
Here’s what happens when a user enters a question or other prompt into the chatbot:
- The vector database is searched for the 250 abstracts whose meaning most closely matches the user’s prompt. (After trial and error, I felt that this number yielded the most-satisfactory results.)
- The user’s question and the 250 abstracts – with their titles and dates – are fed to an LLM along with the following instruction, or ‘system prompt’:
You are a helpful and informative bot that answers questions using text from only the research studies included below. Be sure to respond in complete sentences, being comprehensive, including all relevant background information. However, you are talking to a non-technical audience, so be sure to explain complicated concepts and strike a friendly and conversational tone. If a study is irrelevant to the answer, you may ignore it. Please include a disclaimer that users should not rely on your answers for medical advice. Pay attention to the full conversation history to maintain context and provide coherent responses that build on all previous exchanges.
- An LLM – currently Google Gemini 2.0 Flash – interprets the user’s question, seeks material for a response in the abstracts, and produces an answer. The LLM ‘understands’ language and the ways of the world and uses this background knowledge to interpret the information in the abstracts. Its process is quite complex, but it starts, as above, with converting the texts into sets of numbers representing meaning; then the LLM performs a huge number of calculations to decide which word should come next in its response.
- If the user has a follow-up question, behind the scenes it’s doubled to increase its importance, then it’s added to the previous question and answer, and all is encoded numerically; then the vector database is searched for a new set of 250 abstracts that most closely match the full history of questions and answers. Finally, all of this is joined again with the system prompt for the LLM to answer.
