Giskard’s open-source framework evaluates AI models before they’re pushed into production

Giskard, a French startup, is actively developing an open-source testing framework tailored for extensive language models. This framework serves to notify developers about potential biases, security vulnerabilities, and the model’s capacity to generate harmful or toxic content.

As the AI landscape gains prominence, the focus on Machine Learning (ML) testing systems is growing, especially with impending regulations like the AI Act in the EU and other countries. AI model developers now face the task of demonstrating compliance with regulations, adhering to a set of rules, and mitigating risks to avoid substantial fines.

Giskard distinguishes itself as an AI startup that not only embraces regulatory requirements but also stands out as a pioneering developer tool dedicated to more efficient testing. CEO Alex Combessie, a former employee at Dataiku, emphasizes the shortcomings of existing testing methods in practical applications and the challenges in comparing supplier performance.

Giskard’s testing framework consists of three key components. Firstly, the company offers an open-source Python library compatible with Large Language Model (LLM) projects, specifically retrieval-augmented generation (RAG) projects. This library, already popular on GitHub, integrates seamlessly with various ML ecosystem tools like Hugging Face, MLFlow, Weights & Biases, PyTorch, Tensorflow, and Langchain.

Upon setup, Giskard facilitates the creation of a comprehensive test suite for regular use on the model. The tests cover an array of issues such as performance, hallucinations, misinformation, non-factual output, biases, data leakage, harmful content generation, and prompt injections. Developers can integrate these tests into the continuous integration and continuous delivery (CI/CD) pipeline for automated checks with scan reports delivered directly to repositories like GitHub.

Tests are tailored based on the model’s end use case, with companies working on RAG providing access to relevant databases and knowledge repositories for maximum relevance. For example, a chatbot focused on climate change information would be assessed for potential misinformation or contradictions.

Giskard’s second offering is an AI quality hub, part of its premium package, aiding in debugging large language models and facilitating model comparisons. The startup envisions this hub evolving into a documentation generator, providing evidence of regulatory compliance for models.

The third product, LLMon, serves as a real-time monitoring tool evaluating LLM responses for common issues (toxicity, hallucination, fact-checking) before being sent to users. Although currently compatible with OpenAI’s APIs and LLMs, Giskard is actively working on integrations with platforms like Hugging Face and Anthropic.

In addressing regulatory concerns, particularly regarding the AI Act, Giskard is well-positioned to alert developers to potential misuses of LLMs enriched with external data, specifically in the context of retrieval-augmented generation (RAG).

With a current team of 20 people, Giskard plans to expand significantly, recognizing a clear market fit in the LLM space. CEO Alex Combessie expresses the goal of doubling the team size to establish Giskard as a premier LLM antivirus solution in the market.