AI LangChain4j - Understanding TokenWindowChatMemory

[Last Updated: Jan 19, 2026]

What is TokenWindowChatMemory?

TokenWindowChatMemory is another implementation of ChatMemory This implementation limits conversation history based on the total number of tokens rather than message count.

Why token-based memory?

Messages vary in length. Token-based memory ensures that the prompt always fits within model constraints regardless of message size.

What is a token?

A token is the basic unit of text that a language model processes—typically a word, part of a word, or a single character—and is used to measure and limit conversation history to ensure the total input stays within the model’s length constraints.

Typical use cases

Conversations with variable-length messages
Strict token budget enforcement
Production systems with predictable costs

Example

In this example we are using Ollama with phi3:mini-128k model.

We are going to implement a custom TokenCountEstimator because Ollama does not currently provide a standalone API endpoint for pre-calculating token counts without executing a full model generation. While Ollama returns precise token usage metadata (such as prompt_eval_count) after a response has been generated, components like LangChain4j's TokenWindowChatMemory require an estimate before sending the request to determine if older messages need to be evicted to stay within the defined maxTokens limit. By using a character-based heuristic—where one token is approximately four characters—we provide the memory manager with the necessary logic to manage the context window locally and efficiently.

package com.logicbig.example;

import dev.langchain4j.data.message.ChatMessage;
import dev.langchain4j.data.message.UserMessage;
import dev.langchain4j.model.TokenCountEstimator;

public class MyCustomTokenEstimator implements TokenCountEstimator {

    @Override
    public int estimateTokenCountInText(String text) {
        if (text == null) return 0;
        // Approximation: 4 characters per token
        return (int) Math.ceil(text.length() / 4.0);
    }

    @Override
    public int estimateTokenCountInMessage(ChatMessage message) {
        return estimateTokenCountInText(((UserMessage)message).singleText());
    }

    @Override
    public int estimateTokenCountInMessages(Iterable<ChatMessage> messages) {
        int total = 0;
        for (ChatMessage message : messages) {
            total += estimateTokenCountInMessage(message);
        }
        return total;
    }
}

Please note that this MyCustomTokenEstimator implementation is for demonstration purposes only and uses a character-based heuristic that may not perfectly reflect the specific tokenization logic of every model.

Sending/receiving messages

package com.logicbig.example;

import dev.langchain4j.data.message.AiMessage;
import dev.langchain4j.data.message.UserMessage;
import dev.langchain4j.memory.ChatMemory;
import dev.langchain4j.memory.chat.TokenWindowChatMemory;
import dev.langchain4j.model.chat.ChatModel;
import dev.langchain4j.model.chat.response.ChatResponse;
import dev.langchain4j.model.ollama.OllamaChatModel;

public class TokenWindowChatMemoryExample {

    public static void main(String[] args) {

        ChatModel model = OllamaChatModel.builder()
                                         .baseUrl("http://localhost:11434")
                                         .modelName("phi3:mini-128k")
                                         .numCtx(4096)
                                         .temperature(0.7)
                                         .build();

        ChatMemory memory =
                TokenWindowChatMemory.withMaxTokens(50, new MyCustomTokenEstimator());

        memory.add(UserMessage.from("My favorite color is "
                                            + "Aquamarine. Remember this."));

        // This message is long enough (having a lot to tokens)
        // to force the eviction of the first message.
        memory.add(UserMessage.from("I want to talk about space exploration. " +
                                            "The James Webb Space Telescope is "
                                            + "amazing because it uses infrared " +
                                            "to see through cosmic dust clouds "
                                            + "and find early stars."));

        UserMessage finalQuestion = UserMessage.from("What is my favorite color?");
        memory.add(finalQuestion);

        ChatResponse response1 = model.chat(memory.messages());
        AiMessage aiMessage = response1.aiMessage();
        String response = aiMessage.text();
        System.out.println("LLM Response: " + response);
    }
}

Output

LLM Response: As an artificial intelligence, I don't have personal preferences or experiences such as a favorite color. However, if you want me to simulate curiosity about colors like humans do, then we might say your favorite could be any common one ï¿½ perhaps the tranquility of blue reminds some people of space itself?

Conclusion

The output confirms that older messages are removed once the token limit is exceeded, ensuring the conversation always remains within a safe token window.

Example Project

Dependencies and Technologies Used:

langchain4j 1.10.0 (Build LLM-powered applications in Java: chatbots, agents, RAG, and much more)
langchain4j-ollama 1.10.0 (LangChain4j :: Integration :: Ollama)
JDK 17
Maven 3.9.11

AI LangChain4j - TokenWindowChatMemory

Select All

Download

token-window-chat-memory
- src
  - main
    - java
      - com
        logicbig
        example
        
        MyCustomTokenEstimator.java
        
        TokenWindowChatMemoryExample.java
- pom.xml

Exclusive Offer!