top of page

Create Your First Project

Start adding your projects to your portfolio. Click on "Manage Projects" to get started

LLM Evaluation Agent

Project Type

AI/ML

Date

Jun 2024 - Sep 2024

Repository

Skills

Python (Programming Language) · Large Language Models (LLM)

The LLM Evaluation Agent is a tool designed to leverage Large Language Models (LLMs) for evaluating system performance across various domains. Built with flexibility in mind, this agent integrates into existing projects and supports local execution using models from Ollama and Hugging Face. Its modular architecture enables easy customization and extension, making it suitable for a wide range of evaluation tasks.

Key Features:
- The agent utilizes LLMs to perform robust automated evaluations, offering efficient assessments tailored to specific user needs.
- Adaptable for diverse applications, the agent can handle various data types and evaluation scenarios by customizing its parsers and processing functions.
- With minimal setup, the agent can be incorporated into existing workflows, allowing for quick deployment.
- Provides flexibility in model deployment by supporting both Ollama and Hugging Face models, enabling users to choose the best fit for their computational resources and task complexity.
- Users can easily add new parsers, define custom evaluation criteria, and switch between different LLM models to tailor the agent's functionality to specific project requirements.

Technical Overview:
- Parsers: The agent employs Pydantic models to structure and validate data. Custom parsers can be added to handle different types of evaluation tasks, ensuring that the input and output data conform to the expected formats.
- Customization: The agent’s core functionality can be easily modified by adjusting the processing functions and parsers. Users can dynamically create custom models and update the agent to accommodate new types of data without altering the existing codebase.
- Results Output: After running the evaluation, the results are saved in both CSV and Excel formats, including processed user settings and evaluation scores.

Using a Smartphone
Do not hesitate to contact me to discuss a possible project or learn more about my work.

© 2024 by Dimitris Anastasiou.

bottom of page