Retrieval-Augmented Generation (RAG)
RAG enhances LLM responses by retrieving relevant information from external knowledge sources before generating answers. AI Services simplify RAG implementation by automatically handling retrieval, context injection, and response generation. This is essential for:
- Answering questions based on proprietary documents
- Providing up-to-date information beyond training cut-off
- Reducing hallucinations by grounding responses in facts
- Building domain-specific assistants
Please checkout our RAG tutorial using LangChain4j's low level API.
Example
The following example uses Ollama and phi3:mini-128k which is good for a demo and learning but not good for production-grade applications because it has limited reasoning capabilities and accuracy for complex tasks.
package com.logicbig.example;
import dev.langchain4j.data.document.Document;
import dev.langchain4j.data.document.DocumentSplitter;
import dev.langchain4j.data.document.splitter.DocumentSplitters;
import dev.langchain4j.data.segment.TextSegment;
import dev.langchain4j.model.embedding.EmbeddingModel;
import dev.langchain4j.model.ollama.OllamaChatModel;
import dev.langchain4j.model.ollama.OllamaEmbeddingModel;
import dev.langchain4j.rag.content.retriever.ContentRetriever;
import dev.langchain4j.rag.content.retriever.EmbeddingStoreContentRetriever;
import dev.langchain4j.service.AiServices;
import dev.langchain4j.service.UserMessage;
import dev.langchain4j.store.embedding.EmbeddingStore;
import dev.langchain4j.store.embedding.inmemory.InMemoryEmbeddingStore;
import java.time.Duration;
import java.time.temporal.ChronoUnit;
import java.util.List;
public class RagExample {
private static EmbeddingModel embeddingModel;
private static EmbeddingStore<TextSegment> embeddingStore;
//creating EmbeddingStore/EmbeddingModel with custom documents
static {
embeddingModel = OllamaEmbeddingModel.builder()
.baseUrl("http://localhost:11434")
.modelName("all-minilm")
.build();
embeddingStore = new InMemoryEmbeddingStore<>();
storeDocuments();
}
private static void storeDocuments() {
//document 1
Document document = Document.from(
"""
MySimpleRestFramework is old framework for creating REST APIs.
It provides auto-configuration and embedded servers.
""");
DocumentSplitter splitter = DocumentSplitters.recursive(200, 20);
List<TextSegment> segments = splitter.split(document);
embeddingStore.addAll(
embeddingModel.embedAll(segments).content(),
segments
);
//document 2
Document document2 = Document.from(
"""
MySimpleAiFramework is a Java framework for building
LLM-powered applications.
It supports chat models, embeddings, and
retrieval-augmented generation.
"""
);
List<TextSegment> segments2 =
splitter.split(document2);
embeddingStore.addAll(
embeddingModel.embedAll(segments2).content(),
segments2
);
}
interface DocumentAssistant {
@UserMessage("{{it}}")
String answer(String question);
}
public static void main(String[] args) {
// Create models
OllamaChatModel chatModel =
OllamaChatModel.builder()
.baseUrl("http://localhost:11434")
.modelName("phi3:mini-128k")
.temperature(0.3)
.numCtx(1096)
.timeout(Duration.of(3, ChronoUnit.MINUTES))
.build();
EmbeddingModel embeddingModel = OllamaEmbeddingModel.builder()
.baseUrl("http://localhost:11434")
.modelName("all-minilm")
.build();
ContentRetriever retriever =
EmbeddingStoreContentRetriever.builder()
.embeddingStore(embeddingStore)
.embeddingModel(embeddingModel)
.maxResults(1)
.build();
// Create AI Service with RAG
DocumentAssistant assistant = AiServices.builder(DocumentAssistant.class)
.chatModel(chatModel)
.contentRetriever(retriever)
.build();
String question = "What is MySimpleAiFramework?";
System.out.println("User: " + question);
String response = assistant.answer(question);
System.out.println(response);
}
}
OutputUser: What is MySimpleAiFramework? MySimpleAiFramework is a lightweight Java library designed to facilitate the development of Large Language Model (LLM)-driven software solutions. This framework provides essential tools for integrating LLM capabilities into applications with an emphasis on natural language processing tasks such as chat model interactions, embedding generation, and retrieval-augmented text creation. The goal is to simplify the process by which developers can create intelligent systems that understand and generate human-like responses while handling complex information retrieval processes effectively.
Understanding the Code
-
The example demonstrates a Retrieval-Augmented Generation (RAG) setup using LangChain4j AiServices, where an LLM answers questions using both its own knowledge and retrieved documents.
-
An EmbeddingModel (OllamaEmbeddingModel) is configured to convert text into vector embeddings using the all-minilm model running on a local Ollama server.
-
An in-memory vector store (InMemoryEmbeddingStore) is used to store embeddings along with their corresponding text segments.
-
Documents are created using Document.from() and represent the knowledge base that the assistant can retrieve from.
-
A DocumentSplitter splits each document into smaller TextSegment chunks, improving retrieval accuracy.
-
Each text segment is embedded using the embedding model and then stored in the embedding store along with its vector representation.
-
The DocumentAssistant interface defines the AI service contract. The @UserMessage annotation maps the method input directly to the user prompt.
-
A chat model (OllamaChatModel) is configured to generate natural language responses using the phi3:mini-128k model.
-
The ContentRetriever uses vector similarity search to retrieve the most relevant document segments from the embedding store.
-
EmbeddingStoreContentRetriever limits retrieval to the top matching result using maxResults(1).
-
AiServices.builder() wires together the chat model and content retriever to create a RAG-enabled AI service.
-
When assistant.answer(question) is called, the framework retrieves relevant document content and injects it into the LLM prompt.
-
The final response is generated by the LLM, grounded in the retrieved document data rather than relying solely on the model’s internal knowledge.
Conclusion
The output demonstrates how RAG provides accurate, context-specific answers by retrieving relevant information from the knowledge base. The AI Service proxy automatically handles the entire RAG pipeline: query embedding, similarity search, context assembly, and prompt construction.
Example ProjectDependencies and Technologies Used: - langchain4j 1.10.0 (Build LLM-powered applications in Java: chatbots, agents, RAG, and much more)
- langchain4j-ollama 1.10.0 (LangChain4j :: Integration :: Ollama)
- JDK 17
- Maven 3.9.11
|
|