What is Azure AI Foundry and Foundry Local?
Azure AI Foundry is Microsoft’s comprehensive platform for enterprise AI development and deployment, enabling organizations to build, customize, and operate AI solutions at scale. It provides tools, services, and infrastructure to develop, fine-tune and deploy AI models in production environments with enterprise-grade security and compliance.
Foundry Local is the desktop companion to Azure AI Foundry that brings powerful AI model inference to your local machine. It’s an open-source tool that provides an OpenAI-compatible API, allowing developers to run, test, and integrate large language models directly on their own hardware without sending data to the cloud. This makes it ideal for development, testing, and scenarios requiring data privacy or offline operation.
Why Use Foundry Local with Spring AI?
Running AI models locally has become increasingly valuable for Java developers who want to reduce latency, avoid network dependency, and eliminate API cost overhead. Foundry Local makes this seamless by offering an OpenAI-compatible API, allowing Spring AI to interact with local models as if they were hosted in the cloud.
In this guide, we’ll walk you through setting up Foundry Local, integrating it with a Spring Boot project using Spring AI, and building a simple REST API to interact with the AI model for chat and summarization tasks. This enables powerful local inference with models like Phi-3.5, Qwen, or DeepSeek, using standard Spring idioms.
Quick Setup
Step 1: Install Foundry Local
Foundry Local can be installed via Homebrew for macOS users:
$ brew tap microsoft/foundrylocal $ brew install foundrylocal $ foundry --help
This sets up the CLI tools necessary to run and manage language models locally.
Step 2: Download and Start a Model
Before interacting with any model, you need to list available models, download the one you want, and load it into memory:
# Discover available models $ foundry model list # Download the Phi-3.5 mini model $ foundry model download phi-3.5-mini # Load it into memory for serving $ foundry model load phi-3.5-mini # Check model and service status $ foundry service status
Once loaded, the model will be available through a local HTTP endpoint using an OpenAI-compatible interface.
Create a Spring Boot Project
You can use Spring Initializr to quickly generate a project with the required dependencies.
Run the following cURL command to download a Spring Starter project with the required dependencies:
curl -G https://start.spring.io/starter.zip \ -d javaVersion=21 \ -d baseDir=demo \ -d language=java \ -d bootVersion=3.5.0 \ -d type=maven-project \ -d dependencies=web,spring-ai-openai \ -o demo.zip
Now unzip the project and open Visual Studio Code.
unzip demo.zip code demo/
If you’re adding Spring AI to an existing project, use the following Maven dependencies:
<dependencies> <dependency> <groupId>org.springframework.boot</groupId> <artifactId>spring-boot-starter-web</artifactId> </dependency> <dependency> <groupId>org.springframework.ai</groupId> <artifactId>spring-ai-starter-model-openai</artifactId> <version>1.0.0</version> </dependency> </dependencies>
These dependencies enable the REST interface and Spring AI’s OpenAI integration capabilities.
Connect Spring AI to Foundry Local
Step 1: Configure Properties
Update application.properties to point Spring AI to your locally running Foundry instance:
# Azure AI Foundry Local Configuration spring.ai.openai.api-key=not-used spring.ai.openai.base-url=http://localhost:8081 spring.ai.openai.chat.options.model=Phi-3.5-mini-instruct-generic-gpu # Server Configuration server.port=8080
Even though an API key is required by the config, it won’t be used for local models.
Step 2: Configuration Class
This configuration class sets up the OpenAI-compatible API client and initializes the chat model with your local settings:
@Configuration public class FoundryLocalConfig { @Value("${spring.ai.openai.base-url}") private String baseUrl; @Value("${spring.ai.openai.api-key}") private String apiKey; @Value("${spring.ai.openai.chat.options.model}") private String modelName; @Bean public OpenAiApi openAiApi() { return OpenAiApi.builder().baseUrl(baseUrl).apiKey(apiKey).build(); } @Bean public OpenAiChatModel chatModel(OpenAiApi openAiApi) { OpenAiChatOptions options = OpenAiChatOptions.builder() .model(modelName) .temperature(0.7) .build(); return OpenAiChatModel.builder().openAiApi(openAiApi).defaultOptions(options).build(); } }
This is the bridge that connects Spring AI’s abstraction to the local model served by Foundry.
Step 3: Implement a Service Layer
Create a service class to interact with the chat model. This encapsulates the logic for calling the model:
@Service public class AIService { private final OpenAiChatModel chatModel; public AIService(OpenAiChatModel _chatModel) { chatModel = _chatModel; } public String chat(String message) { try { return chatModel.call(message); } catch (Exception e) { throw new RuntimeException("Error calling AI model: " + e.getMessage(), e); } } public String summarizeText(String text) { String prompt = "Please provide a concise summary of the following text:\n\n" + text; return chatModel.call(prompt); } }
The `chat()` and `summarizeText()` methods abstract out the logic so your controller stays clean.
Step 4: Create REST Endpoints
Now expose those methods via HTTP endpoints:
@RestController @RequestMapping("/api/ai") public class AIController { private final AIService aiService; public AIController(AIService _aiService) { aiService = _aiService; } @PostMapping("/chat") public ResponseEntity<Map<String, String>> chat(@RequestBody Map<String, String> request) { String message = request.get("message"); if (message == null || message.trim().isEmpty()) { return ResponseEntity.badRequest().body(Map.of("error", "Message is required")); } String response = aiService.chat(message); return ResponseEntity.ok(Map.of("response", response)); } @PostMapping("/summarize") public ResponseEntity<Map<String, String>> summarize(@RequestBody Map<String, String> request) { String text = request.get("text"); if (text == null || text.trim().isEmpty()) { return ResponseEntity.badRequest().body(Map.of("error", "Text is required")); } String summary = aiService.summarizeText(text); return ResponseEntity.ok(Map.of("summary", summary)); } @GetMapping("/health") public ResponseEntity<Map<String, String>> health() { try { aiService.chat("Hello"); return ResponseEntity.ok(Map.of("status", "healthy")); } catch (Exception e) { return ResponseEntity.status(HttpStatus.SERVICE_UNAVAILABLE) .body(Map.of("status", "unhealthy", "error", e.getMessage())); } } }
With these endpoints, your app becomes a simple gateway for local AI-powered inference.
Running the Application
Step 1: Start Foundry Local
Configure Foundry to serve the model on port 8081:
foundry service set --port 8081 foundry service start foundry model load phi-3.5-mini
Make sure this service is running and healthy before starting your Spring Boot app.
Step 2: Start Spring Boot
Use Maven to build and run the app:
mvn clean compile mvn spring-boot:run
Step 3: Test Your Endpoints
Perform a health check
curl http://localhost:8080/api/ai/health
Send a chat message:
curl -X POST http://localhost:8080/api/ai/chat \ -H "Content-Type: application/json" \ -d '{"message": "What is Spring AI?"}'
Summarize a block of text:
curl -X POST http://localhost:8080/api/ai/summarize \ -H "Content-Type: application/json" \ -d '{"text": "Your long text here..."}'
Conclusion
In this post, we demonstrated how to build a fully local AI-powered application using Spring AI and Foundry Local.
Thanks to Foundry’s compatibility with OpenAI’s APIs, Spring AI required no special integration work—we simply changed the base URL to localhost. This local-first approach has multiple advantages:
- Privacy & Control: No data leaves your machine
- Performance: Lower latency compared to cloud APIs
- Cost: No API billing or token limits
- Simplicity: Familiar Spring configuration and idioms
0 comments
Be the first to start the discussion.