Caroline Bishop
January 25, 2025 04:44
To enhance the assessment of LLM applications, Langsmith has introduced Pytest and Vitest integrations, providing developers with an improved testing framework.
Langsmith aims to streamline the evaluation process for Languages Model (LLM) applications by unveiling new integrations with Pytest and Vitest. According to Langchain’s blog, this integration is currently in beta with version 0.3.0 of the Langsmith Python and TypeScript SDK, providing improved testing capabilities for developers.
Enhanced Testing Framework for LLM Assessment
The LLM Assessment (EVAL) is important to maintain the credibility and quality of your application. By integrating with Pytest and Vitest, developers familiar with these frameworks can now take advantage of Langsmith’s advanced features, such as observability and sharing capabilities, without compromising the developer experience.
Integrations allow developers to debug tests more effectively, log detailed metrics beyond simple pass/fail results, and easily share results across the team. The non-deterministic nature of LLM adds complexity to debugging. Langsmith stores and addresses the input, output, and stack trace of a test case.
Use built-in evaluation features
Langsmith offers the following built-in assessment features: expect.edit_distance()
Calculate the string distance between test output and reference output. This feature is especially useful for developers whose applications need to continuously release the best version. Detailed insight into these features can be found in Langsmith’s API reference.
Start with Pytest and Vitest
Integration with Pytest requires the developer to add @pytest.mark.langsmith
Decorator in test case. This setup records all test case results, application traces, and traces for Langsmith, giving you a comprehensive view of your application’s performance.
Similarly, Vitest users can create test cases ls.describe()
Blocking to achieve the same level of integration and logging. Both frameworks provide real-time feedback and can be seamlessly integrated into continuous integration (CI) pipelines, helping developers catch regressions early.
Advantages over traditional evaluation methods
Traditional evaluation methods often require predefined datasets and evaluation functions, which can be limiting. Langsmith’s new integration provides flexibility by allowing developers to define specific test cases and evaluation logic tailored to the needs of their application. This approach is particularly advantageous for applications that need to be tested across multiple tools or models with different evaluation criteria.
The real-time feedback provided by these testing frameworks facilitates rapid iteration and local development, allowing developers to quickly improve their applications. Additionally, integration with the CI pipeline ensures that potential regressions are identified and resolved early in the development process.
For more information on how to leverage these integrations, you can refer to the how-to guides available on Langsmith’s comprehensive tutorials and documentation site.
Image source: Shutterstock