Keywords AI
Last September, Andy came across a paper that introduced the concept of FrugalGPT. This paper presented a method for reducing the cost of using Large Language Models (LLMs) while maintaining comparable performance by leveraging various smaller models.
Inspired by FrugalGPT, we quickly turned theory into practice. Raymond developed a Minimum Viable Product (MVP) for an LLM router in just one week, followed by a front-end implementation the next week. Remarkably, we secured our first customer by week three.
This blog post will guide you through our journey of building the LLM router MVP and provide insights on how you can create your own.
FrugalGPT is a framework designed to reduce the costs of using Large Language Models (LLMs) while maintaining or improving performance. It employs three strategies:
FrugalGPT can significantly cut costs (up to 98%) and even enhance accuracy by up to 4% compared to using a single LLM like GPT-4.
After thoroughly reviewing the FrugalGPT paper, Andy recognized its potential to significantly reduce costs for AI startups and developers. This was particularly relevant to us, as our one-stop job search copilot consumed substantial daily resources on OpenAI.
Our investigation led us to focus on LLM approximation, which we found to be the most practical and efficient strategy for our use case. To rapidly gather user feedback, we created a simplified LLM router integrating only GPT-3.5 Turbo and GPT-4.
At the time, the cost difference between these models was substantial:
This stark contrast highlighted the importance of efficient model selection. After all, you wouldn't want your AI application to use the expensive GPT-4 for simple responses like "Hello"!
Our LLM router concept utilizes an embedding model (text-embedding-3-small from OpenAI) to classify input, generate a vector score, and select the most suitable LLM based on that score. To enhance the router's model selection capability, we first established our evaluation and model ranking framework.
Model Ranking System:
We defined models and their characteristics in our backend:
gpt_4 = ModelParam(model_name="gpt-4", speed=48, max_context_window=8192, model_size=170, mmlu_score=86.4, mt_bench_score=8.96, big_bench_score=83, input_cost=30,output_cost=60, rate_limit=10000, function_call=1)
gpt_3_5_turbo = ModelParam(model_name="gpt-3.5-turbo", speed=150, max_context_window=4096, model_size=20, mmlu_score=70, mt_bench_score=7.94, big_bench_score=71.6, input_cost=0.5, output_cost=0.1,function_call=1)
1. Difficulty Levels:
difficulty_details = { "1": "This query is just as a greeting or the most basic human interaction.", "2": "This is a simple question that can be answered with a single sentence.", "3": "This question requires a few sentences to answer, probably some logic or reasoning, basic coding question, or short JSON parsing.", "4": "This question requires a paragraph to answer, requires a fair amount of logic and reasoning, coding algorithm, or long JSON parsing.", "5": "This question requires a long answer, intricated logic or reasoning, complicated coding questions, or complicated nested JSON parsing."}
2. Task Categories:
category_details = { "Writing": "Engaging in narrative and creative text generation.", "Questions": "Responding to inquiries across various topics.", "Math": "Performing calculations and interpreting data.", "Roleplay": "Engaging in simulated dialogues and scenarios.", "Analysis": "Performing sentiment analysis, summarization, and entity recognition.","Creativity": "Generating ideas and concepts in arts and design.", "Coding": "This task requires coding assistants and code generation.", "Education": "Developing learning materials and providing explanations.", "Research": "Conducting online research and compiling information.", "Translation": "Translating text across multiple languages."}
3. Input Formats:
format_details = { "PlainText": "Standard unformatted text responses.", "StructuredData": "JSON, XML, and YAML output for structured data formats.", "CodeScript": "Generation of source code and executable scripts.", "ListOutput": "Bullet points and numbered list formats.", "InteractiveText": "Q&A style and other interactive text elements.","Customized": "Custom instructions and unique output formats."}
4. Expertise Areas:
expertise_details = { "STEM": "Covering science, technology, engineering, mathematics topics.", "Humanities": "Specializing in literature, history, philosophy, arts.", "Business": "Expertise in economics, management, marketing.", "Health": "Providing healthcare and medical knowledge.", "Legal": "Insights into law, politics, and governance.", "Education": "Specializing in teaching and learning techniques.", "Environment": "Focus on ecology, geography, and environmental science.", "Tech": "Information technology, computer science, AI specialization.","Arts": "Covering music, visual arts, and entertainment.","Social": "Understanding of sociology, psychology, anthropology."}
Check out the full code for the LLM router here: LLM Router Code