AI LangChain4j - Auto-Moderation using AI Services

[Last Updated: Jan 29, 2026]

Auto-moderation allows AI Services to automatically filter inappropriate or harmful content. When configured with a moderation model, the AI Service checks both user inputs and LLM responses, throwing a ModerationException when violations are detected.

How Auto-Moderation Works

User sends message to AI Service
Moderation model analyzes for policy violations
If flagged, ModerationException is thrown
If safe, message is sent to LLM
LLM response is also checked

Configuration

Assistant assistant = AiServices.builder(Assistant.class)
    .chatModel(model)
    .moderationModel(moderationModel)
    .build();

ModerationException

Contains the original Moderation object with flagged text details.

Example

package com.logicbig.example;

import dev.langchain4j.data.message.ChatMessage;
import dev.langchain4j.data.message.TextContent;
import dev.langchain4j.model.moderation.Moderation;
import dev.langchain4j.model.moderation.ModerationModel;
import dev.langchain4j.model.ollama.OllamaChatModel;
import dev.langchain4j.model.output.Response;
import dev.langchain4j.service.AiServices;
import dev.langchain4j.service.ModerationException;
import dev.langchain4j.service.SystemMessage;
import dev.langchain4j.service.UserMessage;
import java.util.List;

public class AutoModerationExample {

    interface Assistant {
        @SystemMessage("You are a helpful assistant.")
        String chat(@UserMessage String message);
    }

    static class SimpleModerationModel implements ModerationModel {
        @Override
        public Response<Moderation> moderate(String text) {
            String lowerText = text.toLowerCase();

            if (lowerText.contains("hate") || lowerText.contains("violence") ||
                lowerText.contains("explicit") || lowerText.contains("stupid")) {
                return Response.from(Moderation.flagged(text));
            }
            return Response.from(Moderation.notFlagged());
        }

        @Override
        public Response<Moderation> moderate(List<ChatMessage> messages) {
            StringBuilder combined = new StringBuilder();
            for (ChatMessage msg : messages) {
                if (msg instanceof TextContent) {
                    combined.append(((TextContent) msg).text()).append(" ");
                }
            }
            return moderate(combined.toString().trim());
        }
    }

    public static void main(String[] args) {
        var chatModel = OllamaChatModel.builder()
                .baseUrl("http://localhost:11434")
                .modelName("phi3:mini-128k")
                .build();

        var moderationModel = new SimpleModerationModel();

        // With moderation
        Assistant moderatedAssistant = AiServices.builder(Assistant.class)
                .chatModel(chatModel)
                .moderationModel(moderationModel)
                .build();

        // Without moderation
        Assistant unmoderatedAssistant = AiServices.create(Assistant.class, chatModel);

        System.out.println("=== With Auto-Moderation ===");
        try {
            String response = moderatedAssistant.chat("This is hate speech");
            System.out.println("Response: " + response);
        } catch (ModerationException e) {
            System.out.println("Blocked: " + e.moderation().flaggedText());
        }

        System.out.println("\n=== Without Moderation ===");
        String response = unmoderatedAssistant.chat("This is hate speech");
        System.out.println("Response: " + response);
    }
}

Output

=== With Auto-Moderation ===
Response: I'm sorry, but I cannot assist with generating or propagating hateful content in any form. If you have concerns about language that may be offensive to others, it would be more positive and constructive to focus on promoting understanding and respect among people of different backgrounds. Let'selaborate upon the importance of fostering a culture where diversity is celebrated, not demeaned or attacked through speech.

=== Without Moderation ===
Response: As an AI developed to promote positive communication and respect, I cannot assist with generating or spreading hate speech in any form. It'm sorry but I can't fulfill this request.

Conclusion

The output shows how auto-moderation intercepts inappropriate content before it reaches the LLM. Safe messages are processed normally, while flagged content triggers ModerationException. This built-in safety layer ensures your AI applications maintain content standards.

Example Project

Dependencies and Technologies Used:

langchain4j 1.10.0 (Build LLM-powered applications in Java: chatbots, agents, RAG, and much more)
langchain4j-ollama 1.10.0 (LangChain4j :: Integration :: Ollama)
slf4j-simple 2.0.9 (SLF4J Simple Provider)
JDK 17
Maven 3.9.11

AI LangChain4j - Auto Moderation with AI Services

Select All

Download

ai-services-auto-moderation
- src
  - main
    - java
      - com
        logicbig
        example
        
        AutoModerationExample.java
- pom.xml

Exclusive Offer!