AI LangChain4j - Cancelling Conditionally with StreamingChatModel

[Last Updated: Jan 20, 2026]

When working with StreamingChatModel, there are scenarios where you might want to stop the LLM from generating further text before it finishes normally. This could be due to security filters, length constraints, or detecting specific keywords in the output.

As we saw in the last tutorial, we need to provide the implementation of StreamingChatResponseHandler . The following method of the interface allows you to cancel an LLM request before it completes:

default void onPartialResponse(PartialResponse partialResponse,
                               PartialResponseContext context){}

PartialResponseContext provides the access to StreamingHandle via following method:

public StreamingHandle streamingHandle()

StreamingHandle interface includes a cancel() method, which can be used to terminate the request immediately:

void cancel()

Use Cases

Content Moderation: Terminating the stream if the model begins generating restricted content.
Early Exit: Stopping a search or list generation once a specific item is found.
Resource Management: Reducing token usage and cost by stopping unnecessary output.

Example

In the following example, we ask the model to provide prime numbers. We monitor the incoming tokens in the onPartialResponse method and invoke cancel() as soon as a specific number is detected.

package com.logicbig.example;

import dev.langchain4j.model.chat.StreamingChatModel;
import dev.langchain4j.model.chat.response.ChatResponse;
import dev.langchain4j.model.chat.response.PartialResponse;
import dev.langchain4j.model.chat.response.PartialResponseContext;
import dev.langchain4j.model.chat.response.StreamingChatResponseHandler;
import dev.langchain4j.model.ollama.OllamaStreamingChatModel;
import java.util.concurrent.CountDownLatch;

public class StreamingCancelExample {

    public static void main(String[] args) throws InterruptedException {
        CountDownLatch latch = new CountDownLatch(1);

        StreamingChatModel model =
                OllamaStreamingChatModel.builder()
                                        .baseUrl("http://localhost:11434")
                                        .modelName("phi3:mini-128k")
                                        .numCtx(4096)
                                        .temperature(0.7)
                                        .build();

        System.out.println("Streaming started...");

        model.chat("What are the prime numbers between 1 to 13. Ony return numbers.",
                   new StreamingChatResponseHandler() {

                       @Override
                       public void onPartialResponse(PartialResponse partialResponse,
                                                     PartialResponseContext context) {

                           String text = partialResponse.text();
                           System.out.print(text);
                           if (text.contains("7")) {
                               System.out.println("\n[Condition met. Cancelling...]");
                               context.streamingHandle().cancel();
                               latch.countDown();
                           }
                       }

                       @Override
                       public void onCompleteResponse(ChatResponse response) {
                           latch.countDown();
                       }

                       @Override
                       public void onError(Throwable error) {
                           System.out.println("\nStream stopped.");
                           latch.countDown();
                       }
                   });

        latch.await();
    }
}

Output

Streaming started...
The prime numbers between 1 and 13 are: 2, 3, 5, 7
[Condition met. Cancelling...]

Conclusion

By utilizing the StreamingHandle within onPartialResponse, you can proactively terminate an LLM request once specific conditions are met. This pattern ensures your LangChain4j integration remains efficient, saving resources and reducing latency by calling cancel() as soon as the required information is received or a specific criteria is detected.

Example Project

Dependencies and Technologies Used:

langchain4j 1.10.0 (Build LLM-powered applications in Java: chatbots, agents, RAG, and much more)
langchain4j-ollama 1.10.0 (LangChain4j :: Integration :: Ollama)
JDK 17
Maven 3.9.11

AI LangChain4j - Cancelling conditionally with StreamingChatModel

Select All

Download

streaming-chat-model-canceling
- src
  - main
    - java
      - com
        logicbig
        example
        
        StreamingCancelExample.java
- pom.xml

Exclusive Offer!