June 17th, 2025
1 reaction

Connect Spring AI to Local AI Models with Foundry Local

What is Azure AI Foundry and Foundry Local?

Azure AI Foundry is Microsoft’s comprehensive platform for enterprise AI development and deployment, enabling organizations to build, customize, and operate AI solutions at scale. It provides tools, services, and infrastructure to develop, fine-tune and deploy AI models in production environments with enterprise-grade security and compliance.

Foundry Local is the desktop companion to Azure AI Foundry that brings powerful AI model inference to your local machine. It’s an open-source tool that provides an OpenAI-compatible API, allowing developers to run, test, and integrate large language models directly on their own hardware without sending data to the cloud. This makes it ideal for development, testing, and scenarios requiring data privacy or offline operation.

Why Use Foundry Local with Spring AI?

Running AI models locally has become increasingly valuable for Java developers who want to reduce latency, avoid network dependency, and eliminate API cost overhead. Foundry Local makes this seamless by offering an OpenAI-compatible API, allowing Spring AI to interact with local models as if they were hosted in the cloud.

In this guide, we’ll walk you through setting up Foundry Local, integrating it with a Spring Boot project using Spring AI, and building a simple REST API to interact with the AI model for chat and summarization tasks. This enables powerful local inference with models like Phi-3.5, Qwen, or DeepSeek, using standard Spring idioms.

Quick Setup

Step 1: Install Foundry Local

Foundry Local can be installed via Homebrew for macOS users:

$ brew tap microsoft/foundrylocal
$ brew install foundrylocal
$ foundry --help

This sets up the CLI tools necessary to run and manage language models locally.

Step 2: Download and Start a Model

Before interacting with any model, you need to list available models, download the one you want, and load it into memory:

# Discover available models
$ foundry model list

# Download the Phi-3.5 mini model
$ foundry model download phi-3.5-mini

# Load it into memory for serving
$ foundry model load phi-3.5-mini

# Check model and service status
$ foundry service status

Once loaded, the model will be available through a local HTTP endpoint using an OpenAI-compatible interface.

Create a Spring Boot Project

You can use Spring Initializr to quickly generate a project with the required dependencies.

Run the following cURL command to download a Spring Starter project with the required dependencies:

curl -G https://start.spring.io/starter.zip \
    -d javaVersion=21 \
    -d baseDir=demo \
    -d language=java \
    -d bootVersion=3.5.0 \
    -d type=maven-project \
    -d dependencies=web,spring-ai-openai \
    -o demo.zip

Now unzip the project and open Visual Studio Code.

unzip demo.zip
code demo/

If you’re adding Spring AI to an existing project, use the following Maven dependencies:

<dependencies>
    <dependency>
        <groupId>org.springframework.boot</groupId>
        <artifactId>spring-boot-starter-web</artifactId>
    </dependency>
    <dependency>
        <groupId>org.springframework.ai</groupId>
        <artifactId>spring-ai-starter-model-openai</artifactId>
        <version>1.0.0</version>
    </dependency>
</dependencies>

These dependencies enable the REST interface and Spring AI’s OpenAI integration capabilities.

Connect Spring AI to Foundry Local

Step 1: Configure Properties

Update application.properties to point Spring AI to your locally running Foundry instance:

# Azure AI Foundry Local Configuration
spring.ai.openai.api-key=not-used
spring.ai.openai.base-url=http://localhost:8081
spring.ai.openai.chat.options.model=Phi-3.5-mini-instruct-generic-gpu

# Server Configuration
server.port=8080

Even though an API key is required by the config, it won’t be used for local models.

Step 2: Configuration Class

This configuration class sets up the OpenAI-compatible API client and initializes the chat model with your local settings:

@Configuration
public class FoundryLocalConfig {

    @Value("${spring.ai.openai.base-url}") private String baseUrl;
    @Value("${spring.ai.openai.api-key}")  private String apiKey;
    @Value("${spring.ai.openai.chat.options.model}") private String modelName;

	@Bean
	public OpenAiApi openAiApi() {
		return OpenAiApi.builder().baseUrl(baseUrl).apiKey(apiKey).build();
	}

	@Bean
	public OpenAiChatModel chatModel(OpenAiApi openAiApi) {
		OpenAiChatOptions options = OpenAiChatOptions.builder()
				.model(modelName)
				.temperature(0.7)
				.build();
		return OpenAiChatModel.builder().openAiApi(openAiApi).defaultOptions(options).build();
	}
}

This is the bridge that connects Spring AI’s abstraction to the local model served by Foundry.

Step 3: Implement a Service Layer

Create a service class to interact with the chat model. This encapsulates the logic for calling the model:

@Service
public class AIService {

    private final OpenAiChatModel chatModel;

    public AIService(OpenAiChatModel _chatModel) { chatModel = _chatModel;
    }

    public String chat(String message) {
        try {
            return chatModel.call(message);
        } catch (Exception e) {
            throw new RuntimeException("Error calling AI model: " + e.getMessage(), e);
        }
    }

    public String summarizeText(String text) {
        String prompt = "Please provide a concise summary of the following text:\n\n" + text;
        return chatModel.call(prompt);
    }
}

The `chat()` and `summarizeText()` methods abstract out the logic so your controller stays clean.

Step 4: Create REST Endpoints

Now expose those methods via HTTP endpoints:

@RestController
@RequestMapping("/api/ai")
public class AIController {

    private final AIService aiService;

    public AIController(AIService _aiService) { aiService = _aiService; }

    @PostMapping("/chat")
    public ResponseEntity<Map<String, String>> chat(@RequestBody Map<String, String> request) {
        String message = request.get("message");
        if (message == null || message.trim().isEmpty()) {
            return ResponseEntity.badRequest().body(Map.of("error", "Message is required"));
        }
        
        String response = aiService.chat(message);
        return ResponseEntity.ok(Map.of("response", response));
    }

    @PostMapping("/summarize")
    public ResponseEntity<Map<String, String>> summarize(@RequestBody Map<String, String> request) {
        String text = request.get("text");
        if (text == null || text.trim().isEmpty()) {
            return ResponseEntity.badRequest().body(Map.of("error", "Text is required"));
        }
        
        String summary = aiService.summarizeText(text);
        return ResponseEntity.ok(Map.of("summary", summary));
    }

    @GetMapping("/health")
    public ResponseEntity<Map<String, String>> health() {
        try {
            aiService.chat("Hello");
            return ResponseEntity.ok(Map.of("status", "healthy"));
        } catch (Exception e) {
            return ResponseEntity.status(HttpStatus.SERVICE_UNAVAILABLE)
                .body(Map.of("status", "unhealthy", "error", e.getMessage()));
        }
    }
}

With these endpoints, your app becomes a simple gateway for local AI-powered inference.

Running the Application

Step 1: Start Foundry Local

Configure Foundry to serve the model on port 8081:

foundry service set --port 8081
foundry service start
foundry model load phi-3.5-mini

Make sure this service is running and healthy before starting your Spring Boot app.

Step 2: Start Spring Boot

Use Maven to build and run the app:

mvn clean compile
mvn spring-boot:run

Step 3: Test Your Endpoints

Perform a health check

curl http://localhost:8080/api/ai/health

Send a chat message:

curl -X POST http://localhost:8080/api/ai/chat \
  -H "Content-Type: application/json" \
  -d '{"message": "What is Spring AI?"}'

Summarize a block of text:

curl -X POST http://localhost:8080/api/ai/summarize \
  -H "Content-Type: application/json" \
  -d '{"text": "Your long text here..."}'

Conclusion

In this post, we demonstrated how to build a fully local AI-powered application using Spring AI and Foundry Local.

Thanks to Foundry’s compatibility with OpenAI’s APIs, Spring AI required no special integration work—we simply changed the base URL to localhost. This local-first approach has multiple advantages:

  • Privacy & Control: No data leaves your machine
  • Performance: Lower latency compared to cloud APIs
  • Cost: No API billing or token limits
  • Simplicity: Familiar Spring configuration and idioms

References

Author

Bruno Borges
Principal PM Manager

Bruno is Principal Program Manager for Microsoft's Java Engineering Group. Previously the Java lead for Azure Developer Relations. Conference speaker, open source contributor, Java Champion and influencer, Twitter junkie, beer sommelier.

0 comments