EuroEval: Enhanced Support For Local OpenAI-Compatible APIs
Introduction
Hey guys! There's a growing trend of running machine learning models locally, and it's becoming quite the popular hobby. Think of it like donating your computer's resources for a good cause, similar to the Foldathome project, but specifically for evaluating models. To really get this off the ground, we need to lower the barriers to entry, making it super easy for anyone to run EuroEval evaluations and contribute their compute power. So, let's dive into the exciting feature request: stable support for arbitrary local OpenAI-compatible APIs.
The Feature, Motivation, and Pitch
So, I was thinking, with the increasing popularity of running local ML models, wouldn't it be awesome to make it super easy for everyone to contribute? Imagine being able to run a EuroEval evaluation effortlessly, donating your computer's resources just like you would with Foldathome, but for model evaluations. That’s the idea!
While there's already some support for Ollama, I believe we can take it a step further. The goal is to create an API-agnostic solution. This would allow calling any OpenAI-compatible local API. Personally, I’m juggling multiple models behind llama-swap. Llama-swap essentially combines various inference technologies behind a single OpenAI-compatible API. These models are powered by llama.cpp or vllm, and maybe even sglang in the future. It'd be fantastic to have these evaluations running overnight, especially when I'm not directly using my GPUs. Think of the possibilities!
Currently, I've been trying to make it work with the existing litellm, but it's proving to be a bit brittle. Local APIs tend to throw 429 errors when they get hit with too many requests at once. As far as I can tell, the maximum number of simultaneous requests is hardcoded in the current EuroEval implementation. This, combined with the lack of clear documentation on running local OpenAI-compatible APIs and the less-than-ideal results I've been getting after patching things up, makes it clear that we need some explicit changes to EuroEval. These changes would ensure proper support for local setups.
Addressing the Challenges with Local APIs
One of the primary challenges we face with local APIs is their tendency to return 429 errors when bombarded with numerous simultaneous requests. This is a common issue, and it stems from the limited resources available on a local machine compared to a cloud-based server. The current EuroEval implementation has a hardcoded limit on the number of simultaneous requests, which doesn't play well with the nature of local APIs.
To address this, we need a more flexible approach. Instead of relying on a fixed limit, EuroEval should be able to dynamically adjust the number of concurrent requests based on the API's capacity. This could involve implementing a mechanism to monitor the API's response times and error rates, and then scaling back the number of requests if necessary. By doing so, we can prevent overwhelming the local API and ensure smoother, more reliable evaluations.
Enhancing Documentation and User Experience
Another critical aspect is the lack of comprehensive documentation for running local OpenAI-compatible APIs with EuroEval. This gap in information makes it challenging for users to set up and configure their local environments correctly. Clear, step-by-step instructions and troubleshooting tips are essential to guide users through the process.
Moreover, the user experience can be significantly improved by providing better feedback and error messages. When things go wrong, users need to understand what happened and how to fix it. This could involve providing more descriptive error messages, logging detailed information about the evaluation process, and offering suggestions for resolving common issues.
By addressing these challenges, we can make EuroEval much more accessible and user-friendly for individuals who want to run evaluations locally. This, in turn, will encourage more participation and contributions to the project.
Potential Solutions and Design Choices
I'm happy to say that I’m willing to jump in and contribute to this. I've already started tinkering in my own fork. However, there are some key design choices that need to be made, and I'd love to get some input to make sure I'm on the right track before submitting a PR. Let’s brainstorm and figure out the best way forward!
API-Agnostic Design
To truly support arbitrary local OpenAI-compatible APIs, we need a design that’s not tied to any specific implementation. This means avoiding hardcoded assumptions about API endpoints, request formats, and response structures. Instead, we should aim for a flexible and extensible architecture that can adapt to different APIs with minimal configuration.
One approach could be to introduce an abstraction layer that sits between EuroEval and the underlying API. This layer would be responsible for translating EuroEval’s requests into the API’s specific format and parsing the API’s responses into a format that EuroEval understands. By decoupling EuroEval from the API’s implementation details, we can easily add support for new APIs without modifying the core EuroEval code.
Dynamic Request Management
As mentioned earlier, handling 429 errors is crucial for local APIs. A dynamic request management system can help us avoid overwhelming the API and ensure smoother evaluations. This system would monitor the API’s performance and adjust the number of concurrent requests accordingly.
One way to implement this is to use a token bucket algorithm. Each API would have a bucket of tokens, and each request would consume a token. If the bucket is empty, the request would be delayed until more tokens become available. The rate at which tokens are added to the bucket can be adjusted based on the API’s performance.
Configuration Options
To make EuroEval as versatile as possible, we should provide users with a range of configuration options. This would allow them to tailor EuroEval to their specific needs and environments.
Some potential configuration options include:
- API endpoint URL
 - API key (if required)
 - Maximum number of concurrent requests
 - Request timeout
 - Retry policy
 
By providing these options, we empower users to fine-tune EuroEval’s behavior and optimize it for their local APIs.
Conclusion: The Path Forward
So, to wrap things up, enhancing EuroEval to support local OpenAI-compatible APIs is a significant step towards democratizing model evaluation. By addressing the challenges of local APIs, improving documentation, and making thoughtful design choices, we can make EuroEval more accessible and valuable to a wider audience. This not only encourages more participation but also opens up exciting possibilities for innovation in the ML community. Imagine a world where anyone can easily evaluate models on their own hardware, contributing to a global effort to improve AI.
Let's work together to make this happen. I'm excited to hear your thoughts and feedback on the proposed solutions and design choices. Your input is invaluable, and together, we can shape the future of EuroEval and local ML model evaluation. Cheers to the journey ahead!
By focusing on these key areas, we can transform EuroEval into a powerful tool for local ML development and evaluation. Let's get the ball rolling and make this happen! What are your thoughts? Let's discuss in the comments below!