Article

Getting started with Spring AI

Get started with Spring AI and Ollama: ask questions, process documents, use structured output, measure token usage and more.

Florian Beaufumé
Florian Beaufumé LinkedIn X GitHub
Published 21 Sep 2025 - 6 min read
Getting started with Spring AI

Table of contents

Introduction

There are a lot of Spring AI resources available, but many on them are based on early versions of the framework and so do not work with the general availability release.

This article is up to date with Spring AI 1.0.2 and focuses on the basics. I will show how to setup a local LLM with Ollama. Then how to use it with Spring AI in a Spring Boot application to perform simple tasks.

A sample application illustrating this article is available in GitHub, see spring-ai-basics.

Future articles may address other subjects such as chatbots, function calling, image processing, MCP, RAG, etc.

Ollama setup

For this article I chose to use a local LLM instead of an online service such as OpenAI. This means that there is no need for a subscription. Ollama is a great choice for that. It's a free and open-source command line tool that allows you to run LLMs locally. It can leverage the GPU to improve the performances. An UI was recently added, but I will focus on the command line. It is conceptually similar to Docker, but for LLMs.

First download and run the installer from Ollama download. Ollama comes empty, we have to install an LLM. It supports various models (Llama, Mistral, DeepSeek, Gemma, etc) and many models come in various sizes (measured in billions of parameters), see Ollama models. Larger models are usually more capable but also slower to run. I chose to use Llama 3.1 with 8 billion parameters. It is quite light (only 4.9 GB) and should work fine on small GPUs.

Install that LLM using this Ollama command line:

ollama pull llama3.1:8b

The Ollama setup is now complete, but here is a short selection of useful Ollama commands:

  • ollama help provides some help.
  • ollama list lists the installed models.
  • ollama run llama3.1:8b to run a given model and chat with it in natural language (use /bye to exit when done).
  • ollama ps to get some usage metrics.
  • ollama rm llama3.1:8b to remove a given model.

Configure Spring AI

Spring AI is a framework that simplifies the integration of large language models (LLMs) into Spring applications.

Enabling Spring AI to a Spring Boot application is straightforward. We add the right dependency starters and some Spring configuration properties.

In your pom.xml, first declare the Spring AI BOM:

<dependencyManagement>
...
<dependencies>
<dependency>
<groupId>org.springframework.ai</groupId>
<artifactId>spring-ai-bom</artifactId>
<version>1.0.2</version>
<type>pom</type>
<scope>import</scope>
</dependency>
</dependencies>
</dependencyManagement>

Then add the Ollama starter:

<dependency>
<groupId>org.springframework.ai</groupId>
<artifactId>spring-ai-starter-model-ollama</artifactId>
</dependency>

Additional starters are available for other LLM providers, for example use org.springframework.ai:spring-ai-starter-model-azure-openai for Azure OpenAI.

Next, we need to configure the LLM used by Spring AI. In your application.properties file, add the following:

spring.ai.ollama.chat.options.model=llama3.1:8b

It's enough to get started with Ollama using Llama 3.1. If you prefer using another model, the configuration will vary. For example, online LLMs may require to configure your API key.

Ask questions

Now we write some code to actually use the LLM. In this section we will ask simple questions to the LLM and retrieve the answers.

This is achieved using a ChatClient instance. That class comes from Spring AI but is not autoconfigured. We have to create it ourselves. This can be done in a custom @Configuration class:

@Configuration
public class AiConfig {

@Bean
public ChatClient chatClient(ChatClient.Builder builder) {
return builder.build();
}
}

ChatClient.Builder is autoconfigured by Spring AI and is used to create ChatClient instances. That builder contains a ChatModel instance, also autoconfigured by Spring AI, that manages the interactions with the LLM.

The previous code is very basic but serves our purpose. When implementing more advanced use cases, such as chatbots or RAG, the ChatClient creation is more complex with a system prompt, conversation and memory management, tools declaration, vector store, etc.

Now we can use the ChatClient to ask questions to the LLM, for example in a business service:

@Service
public class BusinessService {

@Autowired
private ChatClient chatClient;

public String askQuestion(String question) {
return chatClient.prompt(question).call().content();
}
}

When we call that method with, for example, Tell me a programming joke we get an answer such as:

Why do programmers prefer dark mode? Because light attracts bugs!

Use prompt templates

To improve the code reuse or to enforce certain formats of prompt we can use prompt templates. For example, we can write a method that is specialized in producing jokes for a given subject:

public String tellJoke(String subject) {
PromptTemplate template = new PromptTemplate("Tell me a joke about {subject}");
Prompt prompt = template.create(Map.of("subject", subject));
return chatClient.prompt(prompt).call().content();
}

The template content can also be stored in a resource file. To do so, create src/main/resources/templates/joke-prompt.st file containing:

Tell me a joke about {subject}

Then use:

@Value("classpath:templates/joke-prompt.st")
private Resource jokePrompt;

public String tellJoke(String subject) {
PromptTemplate template = new PromptTemplate(jokePrompt);
Prompt prompt = template.create(Map.of("subject", subject));
return chatClient.prompt(prompt).call().content();
}

Configure the temperature

When working with LLMs, the temperature is a key parameter that controls the randomness or creativity of the model's output. It is a floating number between 0.0 and 1.0. The higher the value, the more creative and diverse the responses are. The lower the value, the more deterministic and predictable they are.

When asking the LLM to generate some content (a joke, a poem, etc) a high temperature can be useful. But when answering a question or processing a document, a low temperature may be preferred.

The default is often 0.7 but can vary depending on the LLM. For a given application, it can be configured at multiple levels.

We can configure the temperature globally in our application.properties file:

spring.ai.ollama.chat.options.temperature=0.5

Or we can configure it programmatically when creating a ChatClient:

@Bean
public ChatClient chatClient(ChatClient.Builder builder) {
return builder.defaultOptions(ChatOptions.builder().temperature(0.5).build()).build();
}

Or we can configure it for a given LLM request:

public String askQuestion(String question) {
return chatClient.prompt(question)
.options(ChatOptions.builder().temperature(0.5).build())
.call().content();
}

Process business documents

In addition to generating some content or answering questions, LLMs can also process business documents. They can summarize them, extract some information, etc. In this section we will analyze several documents to make a decision.

Suppose that I am the mayor of a fictional city. I've been informed that a horde of mutant rats is about to attack the city. I need to hire a superhero to save the city. But since most of the budget was already spent, the city can only afford one superhero. Maybe the LLM can help find the right pick for our situation.

We have some files describing the candidates. They are in various formats (PDF, Word, PowerPoint, HTML, etc). We want to analyze these documents and find the best candidate.

This is implemented using prompt stuffing, i.e. we read the content of the documents and insert them into the prompt. This is a simple approach that works well for documents that fit into the LLM context. For larger documents, we may want to use RAG (Retrieval Augmented Generation) instead.

To read the documents, we can use Apache Tika. That library can read a wide range of formats and is supported by Spring AI. We add the corresponding dependency in our pom.xml:

<dependency>
<groupId>org.springframework.ai</groupId>
<artifactId>spring-ai-tika-document-reader</artifactId>
</dependency>

Then we can implement a method that reads the documents, builds the prompt and calls the LLM:

public String chooseHero() {
String request = """
I'm the mayor of a medium-sized city.
I received information that large mutant rats are about to attack the city.
These rabid rats come from the sewers. They are about the size of a dog and there are hundreds of them.
I need to hire a superhero to protect the city, but I can afford only one.
From the next superheroes, describe the pros and cons of each one against the rats invasion.
Then choose the one that you think is the best to protect the city and explain why you selected that superhero.
"""
;

// Add the heroes to the request
List<String> heroes = List.of("hero1.pdf", "hero2.doc", "hero3.odt", "hero4.pptx", "hero5.html");
for (int i = 0; i < heroes.size(); i++) {
// Read the hero file using Tika
Resource resource = new ClassPathResource("data/" + heroes.get(i));
TikaDocumentReader tikaReader = new TikaDocumentReader(resource);
String heroContent = tikaReader.get().getFirst().getText();

request += "\n\nSuperhero %d:\n%s\n\n".formatted(i + 1, heroContent);
}

return chatClient.prompt(request).call().content();
}

A truncated sample output is:

What a delightful scenario! As the mayor of this medium-sized city, I'll evaluate each superhero's pros and cons to determine who would be best suited to protect our city from the mutant rat invasion.

(... truncated for brevity, the LLM described the pros and cons of each superhero ...)

After careful consideration, I recommend **Terraquake** as the best choice for protecting your city from the mutant rat infestation.

Why? Terraquake's geokinesis and seismic shockwaves abilities would allow them to directly address the source of the problem – the sewers. They could create powerful shockwaves to disperse the rats or seal off sewer entrances, reducing the threat to human life and property. Additionally, their rock and mineral armor creation ability would provide essential protection against potential bites or scratches.

While other superheroes have strengths that might be useful in this situation (e.g., Solarion Blaze's heat vision), Terraquake's unique powers are specifically tailored for dealing with seismic threats like an infestation of giant mutant rats from the sewers.

Structured outputs

In the previous examples, the LLM returned some text. This is fine when the result is for a human. But we can also generate structured data that the rest of the application can use. This opens a lot of new possibilities.

In the prompt call, we define the expected return type. During the execution, Spring AI will instruct the LLM to return a JSON response then deserialize it to a Java object.

Here is an example that produces a list of beans:

public List<Country> getMostPopulatedCountries() {
String request = "List the 3 most populated countries. For each country, provide the name and the population in millions.";
return chatClient.prompt(request).call()
.entity(new ParameterizedTypeReference<>() {
});
}

As with many libraries that unmarshall generic data, we use a special return type (ParameterizedTypeReference in this case).

Country is a simple Java record:

public record Country(String name, int population) {
}

If we log the resulting list, we get:

[Country[name=China, population=1439], Country[name=India, population=1380], Country[name=United States, population=331]]

It is also possible to generate a single bean using .entity(MyBean.class).

Measure the token usage

LLMs use tokens to process the input and generate the output. If you use an online LLM service provider, you may be billed per the tokens used. That's why you may want to keep an eye on the tokens usage. Even with a local LLM, you may simply be curious about such usage.

With Spring AI we can easily extract the token usage for each request:

public String askQuestion(String question) {
ChatResponse chatResponse = chatClient.prompt(question).call().chatResponse();

Usage usage = chatResponse.getMetadata().getUsage();
LOGGER.info("Used {} tokens for this request ({} for prompt + {} for response generation)",
usage.getTotalTokens(), usage.getPromptTokens(), usage.getCompletionTokens());

return chatResponse.getResult().getOutput().getText();
}

Here is a sample output for the question Tell me a programming joke:

INFO c.a.sample.service.BusinessService       : Used 49 tokens for this request (15 for prompt + 34 for response generation)

Logging advisor

Spring AI supports the notion of advisors. They can intercept or modify the interactions with the LLM. Spring AI brings several useful interceptors. We can also implement our owns.

In future articles about chatbot and RAG, I will show how to use several powerful built-in advisors.

For now, I will simply describe SimpleLoggerAdvisor from Spring AI that can log the interactions with the LLM. It can be useful for debugging or simply to understand what happens under the hood. For example, you may be curious to see how Spring AI instructs the LLM to return a structured output.

The advisor can be declared during the ChatClient creation:

@Bean
public ChatClient chatClient(ChatClient.Builder builder) {
return builder
.advisor(new SimpleLoggerAdvisor())
.build();
}

Then, we simply need to enable the corresponding logger in application.properties:

logging.level.org.springframework.ai.advisor.SimpleLoggerAdvisor=debug

When executing an LLM call, we get two log messages, one for the request and one for the response.

The request log for the previous code sample about structured output is:

DEBUG o.s.a.c.c.advisor.SimpleLoggerAdvisor    : request: ChatClientRequest[prompt=Prompt{messages=[UserMessage{content='List the 3 most populated countries. For each country, provide the name and the population in millions.', properties={messageType=USER}, messageType=USER}], modelOptions=org.springframework.ai.ollama.api.OllamaOptions@93d8b51d}, context={spring.ai.chat.client.output.format=Your response should be in JSON format.
Do not include any explanations, only provide a RFC8259 compliant JSON response following this format without deviation.
Do not include markdown code blocks in your response.
Remove the ```json markdown from the output.
Here is the JSON Schema instance your output must adhere to:
```{
"$schema" : "https://json-schema.org/draft/2020-12/schema",
"type" : "array",
"items" : {
"type" : "object",
"properties" : {
"name" : {
"type" : "string"
},
"population" : {
"type" : "integer"
}
},
"additionalProperties" : false
}
}```
}]

We can see our request at the end of the first line. We can also see the instructions and the schema description added by Spring AI to get a JSON content.

The response log is:

2025-09-12T15:57:44.603+02:00 DEBUG 7316 --- [           main] o.s.a.c.c.advisor.SimpleLoggerAdvisor    : response: {
"result" : {
"output" : {
"messageType" : "ASSISTANT",
"metadata" : {
"messageType" : "ASSISTANT"
},
"toolCalls" : [ ],
"media" : [ ],
"text" : "[\n {\n \"name\": \"China\",\n \"population\": 1439\n },\n {\n \"name\": \"India\",\n \"population\": 1381\n },\n {\n \"name\": \"United States\",\n \"population\": 331\n }\n]"
},
"metadata" : {
"finishReason" : "stop",
"contentFilters" : [ ],
"empty" : true
}
},
"metadata" : {
"id" : "",
"model" : "llama3.1:8b",
"rateLimit" : {
"tokensReset" : 0.0,
"tokensRemaining" : 0,
"requestsLimit" : 0,
"requestsReset" : 0.0,
"tokensLimit" : 0,
"requestsRemaining" : 0
},
"usage" : {
"promptTokens" : 187,
"completionTokens" : 60,
"totalTokens" : 247
},
"promptMetadata" : [ ],
"empty" : false
},
"results" : [ {
"output" : {
"messageType" : "ASSISTANT",
"metadata" : {
"messageType" : "ASSISTANT"
},
"toolCalls" : [ ],
"media" : [ ],
"text" : "[\n {\n \"name\": \"China\",\n \"population\": 1439\n },\n {\n \"name\": \"India\",\n \"population\": 1381\n },\n {\n \"name\": \"United States\",\n \"population\": 331\n }\n]"
},
"metadata" : {
"finishReason" : "stop",
"contentFilters" : [ ],
"empty" : true
}
} ]
}

The response log contains the LLM response as well as various technical information such as the token usage.

Conclusion

This article described how to set up a local LLM with Ollama and use it with Spring AI to perform basic tasks such as answering questions or analyzing business documents. I also showed how to configure the temperature, generate structured outputs, measure the token usage and log the interactions with the LLM.

This is a good start but Spring AI is capable of much more. Additional articles may cover other interesting features.

© 2007-2025 Florian Beaufumé