[MS] What's new in Microsoft Foundry | March 2026

TL;DR

Foundry Agent Service (GA): The next-gen agent runtime is production-ready — Responses API-based, end-to-end private networking, MCP auth expansion (including OAuth passthrough), Voice Live preview, and hosted agents in 6 new regions.
GPT-5.4 + GPT-5.4 Pro (GA): Production-grade reasoning with integrated computer use, stronger instruction adherence, and dependable multi-step execution. Standard at $2.50/$15 per million tokens; Pro at $30/$180 for deep analytical workloads.
GPT-5.4 Mini (GA): Cost-efficient small model for classification, extraction, and lightweight tool calls — the high-volume tier in a GPT-5.4 routing strategy.
Phi-4 Reasoning Vision 15B: Multimodal reasoning meets the Phi family — visual understanding with chain-of-thought for charts, diagrams, and document layouts.
Evaluations (GA) + Continuous Monitoring: Out-of-the-box and custom evaluators with continuous production monitoring piped into Azure Monitor — quality isn't a pre-ship checkbox anymore, it's a live signal.
azure-ai-projects SDK (GA): Python 2.0.0, JS/TS 2.0.0, and Java 2.0.0 all shipped stable releases targeting the GA REST v1 surface. .NET 2.0.0 followed on April 1. The azure-ai-agents dependency is gone — everything lives under AIProjectClient.
Fireworks AI on Foundry (Preview): High-performance open model inference — DeepSeek V3.2, gpt-oss-120b, Kimi K2.5, and MiniMax M2.5 with bring-your-own-weights support.
NVIDIA Nemotron Models: Open NVIDIA models now first-class in the Foundry catalog, announced at GTC alongside the Agent Service GA.
Grok 4.2 (GA): xAI's refreshed chat model graduates from beta.
Priority Processing (Preview): Dedicated compute lane for latency-sensitive AI workloads — reserved capacity for real-time agents and customer-facing chat.
Palo Alto Prisma AIRS + Zenity (GA): Third-party runtime security integrations for prompt injection, data leakage, and tool misuse detection.
Tracing (GA): End-to-end agent trace inspection with sort, filter, and data model refinements.
PromptFlow Deprecation: Migration to Microsoft Framework Workflows required by January 2027.

Build Your First Agent with Foundry Agent Service

Join the community

Connect with 25,000+ developers on Discord, ask questions in GitHub Discussions, or subscribe via RSS to get this digest monthly.

Models

GPT-5.4 + GPT-5.4 Pro (GA)

GPT-5.4 went generally available on March 5 — and this one is about reliability, not raw intelligence. If you've been fighting task drift, mid-workflow failures, and inconsistent tool calling in production agents, GPT-5.4 is designed specifically for those problems. Stronger reasoning over long interactions, better instruction adherence, and integrated computer use capabilities for structured orchestration of tools, files, and data extraction. GPT-5.4 Pro is the premium variant for when analytical depth matters more than latency — multi-path reasoning evaluation, improved stability across long reasoning chains, and enhanced decision support for scientific research and complex trade-off analysis.

Model	Context	Pricing (per M tokens)	Best For
GPT-5.4 (≤272K input)	272K	$2.50 input / $0.25 cached / $15 output	Production agents, coding, document workflows
GPT-5.4 (>272K input)	Extended	$5.00 input / $0.50 cached / $22.50 output	Large-context reasoning
GPT-5.4 Pro	Full	$30 input / $180 output	Deep analysis, scientific reasoning

Deployment: Standard Global and Standard Data Zone (US) at launch, with additional options coming.

Action: Deploy GPT-5.4 from the model catalog. If you're running GPT-5.2 in production, GPT-5.4 is a drop-in upgrade with better instruction following and fewer mid-workflow failures.

Read the GPT-5.4 Announcement

GPT-5.4 Mini (GA)

GPT-5.4 Mini shipped on March 17 — OpenAI's latest small model optimized for fast, cost-efficient tasks like classification, extraction, and lightweight tool calls. A step up from GPT-5 mini in instruction following and structured output reliability, at a fraction of the cost of full GPT-5.4. If you're routing by complexity — GPT-5.4 Mini handles the high-volume, low-latency tier while GPT-5.4 takes the reasoning-heavy work.

Read the GPT-5.4 Mini Announcement

Phi-4 Reasoning Vision 15B

Phi-4 Reasoning Vision 15B brings multimodal reasoning to the Phi family — a 15B-parameter model that combines visual understanding with chain-of-thought reasoning. Handles charts, diagrams, document layouts, and visual Q&A with strong performance relative to its size.

Explore Phi-4 Reasoning Vision

Grok 4.2 (GA)

Grok 4.2 from xAI graduated to general availability on March 30, following its public beta earlier this year. A refreshed chat model in the catalog — available via serverless or provisioned throughput deployments.

Explore in Model Catalog

Fireworks AI on Foundry (Public Preview)

Fireworks AI brings high-performance open model inference to Foundry — processing over 13 trillion tokens daily at ~180K requests/second in production. Four models available at launch:

Model	Notes
DeepSeek V3.2	Sparse attention, 128K context
gpt-oss-120b	OpenAI's open-source model
Kimi K2.5	Moonshot AI's latest
MiniMax M2.5	New to Foundry with serverless support

The real story here is bring-your-own-weights (BYOW) — upload and register quantized or fine-tuned weights from anywhere without changing the serving stack. Deploy via serverless pay-per-token or provisioned throughput.

Get Started with Fireworks

NVIDIA Nemotron Models

Announced at NVIDIA GTC on March 16, NVIDIA Nemotron models are now available through the Foundry catalog. Open models on NVIDIA accelerators, joining the widest selection of models on any cloud. Combined with Fireworks AI integration, you can fine-tune Nemotron into low-latency assets distributable to the edge.

Browse Nemotron in Catalog

OSS Models in NextGen (GA)

Open-source models are now fully integrated into the NextGen Foundry experience at GA — unifying OSS and OpenAI models in a single deployment and management flow. No more context-switching between model providers.

Explore Model Catalog

Agents

Foundry Agent Service (GA)

The biggest story of the month. The next-gen Foundry Agent Service is generally available — built on the OpenAI Responses API, wire-compatible with OpenAI agents, and open to models from DeepSeek, xAI, Meta, LangChain, LangGraph, and more. If you're building with the Responses API today, migrating to Foundry is minimal code changes. What you gain: enterprise security, private networking, Entra RBAC, full tracing, and evaluation — on top of your existing agent logic.

import os
from azure.identity import DefaultAzureCredential
from azure.ai.projects import AIProjectClient
from azure.ai.projects.models import PromptAgentDefinition

with (
    DefaultAzureCredential() as credential,
    AIProjectClient(endpoint=os.environ["AZURE_AI_PROJECT_ENDPOINT"], credential=credential) as project_client,
    project_client.get_openai_client() as openai_client,
):
    agent = project_client.agents.create_version(
        agent_name="my-enterprise-agent",
        definition=PromptAgentDefinition(
            model=os.environ["AZURE_AI_MODEL_DEPLOYMENT_NAME"],
            instructions="You are a helpful assistant.",
        ),
    )

    conversation = openai_client.conversations.create()
    response = openai_client.responses.create(
        conversation=conversation.id,
        input="What are best practices for building AI agents?",
        extra_body={"agent_reference": {"name": agent.name, "type": "agent_reference"}},
    )
    print(response.output_text)

Action: pip install azure-ai-projects — that's it. As of 2.0.0, the package bundles openai and azure-identity as direct dependencies, so you no longer need to install them separately. If you're coming from azure-ai-agents, agents are now first-class operations on AIProjectClient — remove your standalone azure-ai-agents pin.

Read the Full Announcement

Voice Live + Foundry Agents (Preview)

Voice Live is a fully managed, real-time speech-to-speech runtime that collapses the traditional STT → LLM → TTS pipeline into a single managed API. Semantic voice activity detection, end-of-turn detection, server-side noise suppression, echo cancellation, and barge-in support — all built-in. Connect Voice Live directly to an existing Foundry agent. The agent's prompt, tools, and configuration stay in Foundry; Voice Live handles the audio pipeline. Voice interactions go through the same agent runtime as text — same evaluators, same traces, same cost visibility.

import asyncio
from azure.ai.voicelive.aio import connect, AgentSessionConfig
from azure.identity.aio import DefaultAzureCredential

async def run():
    agent_config: AgentSessionConfig = {
        "agent_name": "my-enterprise-agent",
        "project_name": "my-foundry-project",
    }

    async with DefaultAzureCredential() as credential:
        async with connect(
            endpoint=os.environ["AZURE_VOICELIVE_ENDPOINT"],
            credential=credential,
            agent_config=agent_config,
        ) as connection:
            await connection.session.update(session=session_config)
            async for event in connection:
                ...

asyncio.run(run())

View Voice Live Sample

MCP Authentication Expansion

MCP server connections now support the full authentication spectrum:

Auth Method	Use Case
Key-based	Static API tokens via Custom Keys connection
Entra Agent Identity	Service-to-service with managed identity
Managed Identity	Azure resource access
OAuth Identity Passthrough	User-delegated access (OneDrive, Salesforce, SaaS APIs)

OAuth Identity Passthrough is the standout — when agents need to act on behalf of a specific user, not as a shared system identity:

from azure.ai.projects.models import MCPTool, PromptAgentDefinition

tool = MCPTool(
    server_label="github-api",
    server_url="https://api.githubcopilot.com/mcp",
    require_approval="always",
    project_connection_id=os.environ["MCP_PROJECT_CONNECTION_ID"],
)

agent = project_client.agents.create_version(
    agent_name="my-mcp-agent",
    definition=PromptAgentDefinition(
        model=os.environ["AZURE_AI_MODEL_DEPLOYMENT_NAME"],
        instructions="Use MCP tools as needed.",
        tools=[tool],
    ),
)

MCP Auth Documentation

Evaluations (GA) + Continuous Monitoring

Foundry Evaluations shipped GA with three layers:

Out-of-the-box evaluators — coherence, relevance, groundedness, retrieval quality, safety. No configuration required.
Custom evaluators — encode your own criteria: business logic, tone standards, domain compliance.
Continuous evaluation — Foundry samples live traffic automatically, runs your evaluator suite, and surfaces results in Azure Monitor dashboards. Configure alerts for quality drift.

Here's the pattern for running evaluations against an agent target:

eval_object = openai_client.evals.create(
    name="Agent Quality Evaluation",
    data_source_config=DataSourceConfigCustom(
        type="custom",
        item_schema={"type": "object", "properties": {"query": {"type": "string"}}, "required": ["query"]},
        include_sample_schema=True,
    ),
    testing_criteria=[
        {
            "type": "azure_ai_evaluator",
            "name": "fluency",
            "evaluator_name": "builtin.fluency",
            "initialization_parameters": {"deployment_name": os.environ["AZURE_AI_MODEL_DEPLOYMENT_NAME"]},
            "data_mapping": {"query": "", "response": ""},
        },
        {
            "type": "azure_ai_evaluator",
            "name": "task_adherence",
            "evaluator_name": "builtin.task_adherence",
            "initialization_parameters": {"deployment_name": os.environ["AZURE_AI_MODEL_DEPLOYMENT_NAME"]},
            "data_mapping": {"query": "", "response": ""},
        },
    ],
)

run = openai_client.evals.runs.create(
    eval_id=eval_object.id,
    name=f"Run for {agent.name}",
    data_source={
        "type": "azure_ai_target_completions",
        "source": {
            "type": "file_content",
            "content": [{"item": {"query": "What is the capital of France?"}}],
        },
        "input_messages": {
            "type": "template",
            "template": [{"type": "message", "role": "user", "content": {"type": "input_text", "text": ""}}],
        },
        "target": {"type": "azure_ai_agent", "name": agent.name, "version": agent.version},
    },
)

View Evaluation Samples

Prompt Optimizer in Agent Playground

The Prompt Optimizer is now integrated directly into the Agent playground — iteratively improve prompts before shipping agents. Data-driven prompt tuning connected to evaluation results.

Try Prompt Optimizer

Hosted Agents in 6 New Regions

Hosted agent deployments are now available in East US, North Central US, Sweden Central, Southeast Asia, Japan East, and more — relevant for data residency requirements and latency optimization.

Deploy a Hosted Agent

Safety & Guardrails

Palo Alto Prisma AIRS + Zenity (GA)

Third-party guardrail integrations are now generally available. Register and manage runtime security from Palo Alto Networks Prisma AIRS and Zenity directly in Foundry to detect prompt injection, toxic content, malicious URLs, sensitive data leakage, and tool misuse alongside native guardrails.

Integration	Detects
Palo Alto Prisma AIRS	Prompt injection, toxic content, malicious URLs, data leakage
Zenity	Prompt injection, tool misuse, data exfiltration

Announced at NVIDIA GTC alongside the Foundry Agent Service GA.

Guardrails Documentation

Task Adherence as Native Guardrail

Task Adherence is now a native guardrail risk type in Foundry — block or annotate off-task tool calls without wiring directly to the standalone Task Adherence API.

Configure Task Adherence

Tool-Call & Tool-Response Guardrails

New intervention points in Foundry Guardrails for tool invocations and tool responses — detect and mitigate risks in what tools are being called and what they return. Reduces data leakage and unsafe tool behavior in agentic workflows.

Explore Tool Guardrails

Agent Mitigations & Guardrail Customization (GA)

Hardened agent mitigations and guardrail customization round out GA-level safety controls for production agents.

Safety Controls Documentation

Speech, Audio & Avatars

Neural HD TTS Updates + MAI-voice-1

March brings updates to the Neural HD TTS stack including MAI-voice-1 integration — higher-fidelity, more expressive synthetic speech with improved quality across secondary locales.

TTS Documentation

Fast Transcription — 5-Hour Support

Fast Transcription now supports up to ~5-hour audio inputs — addressing top customer requests for long-form meeting and media transcription.

Transcription Documentation

Dynamic Vocabulary (GA)

Dynamic vocabulary for English is now GA — improving recognition of domain-specific terms in Teams transcription and related STT workloads. Custom dictionary support (Tier 2) is in public preview for finer pronunciation control.

STT Documentation

Custom Photo & Video Avatars in NextGen

Bring custom photo and video avatars into the Foundry NextGen experience — supporting richer, branded AI presenters for enterprise scenarios.

Avatar Documentation

Playground GAs — TTS, Avatar, STT & Speech Translation

Three playgrounds reached GA this month:

Playground	Status	What It Does
TTS Playground	GA	Audition voices and parameters before integrating
Avatar Playground	GA	Preview and tune avatar experiences in NextGen
STT & Speech Translation Playground	GA	Trial STT and translation models

Try the Playgrounds

Platform

Priority Processing (Preview)

Priority Processing gives you a dedicated compute lane for latency-sensitive and business-critical AI workloads. When you need guaranteed low-latency inference — real-time agents, customer-facing chat, time-sensitive pipelines — Priority Processing routes your requests through reserved capacity instead of competing with standard traffic. Available for OpenAI models in Foundry with configurable priority tiers per deployment.

Read the Priority Processing Announcement

Tracing (GA)

Tracing is finalized for GA — UX polish, data model refinements, and sort/filter capabilities so you can reliably inspect agent traces in production. New OTel semantics for AI workloads (memory, state, planning) improve interoperability across tooling.

Tracing Documentation

End-to-End Private Networking

Foundry Agent Service now supports Standard Setup with private networking — BYO VNet, no public egress, container/subnet injection. Extended to tool connectivity: MCP servers, Azure AI Search indexes, and Fabric data agents all operate over private network paths. Managed VNET logging adds firewall/NSG/flow logs for visibility into isolated environments.

Private Networking Docs

Foundry Control Plane ARM API

A consolidated Foundry Control Plane ARM API gives enterprises a unified way to manage agents, models, and tools via ARM. Public preview of FCP support for Azure Functions and App Service lets you govern function-hosted agents centrally.

FCP Documentation

Fine-Tuning CLI

A new CLI for configuring, submitting, and monitoring fine-tuning jobs — a faster, code-first way to iterate on custom models. Cost estimation based on token projections helps plan spend before running large training jobs.

Fine-Tuning Documentation

Platform Updates Rollup

Eval results ↔ agent trace linking — evaluation results now connect to the underlying agent trace, closing a key observability gap for debugging.
Local evals aligned with Evaluators catalog — run local evaluations with the same primitives as hosted runs, no hardcoded SDK logic.
PII NextGen playgrounds — conversational and document PII detection with updated configuration panels exposing preview features.
Notification center — tenant-level notifications (not just project-scoped), plus email delivery for critical eval, safety, and deployment alerts.
Free Trial & PAYG — Free Trial as default sign-up path, in-app PAYG subscription creation, and in-app trial start to reduce friction.
CMK for Azure AI Search — service-level customer-managed key configuration so admins set encryption defaults once, not per-index.

SDK & Language Changelog (March 2026)

March was the SDK GA month. The Foundry REST API went GA in February, and this month the SDKs followed — Python, JS/TS, and Java all shipped stable 2.0.0 releases targeting the v1 REST surface. .NET 2.0.0 shipped April 1. The azure-ai-agents dependency is gone across all languages; agents, evals, memory, and inference all live under the unified AIProjectClient.

Python

azure-ai-projects 2.0.0 (Mar 6) + 2.0.1 (Mar 12) First stable release. This is the one to pin for production. Features:

Dependency consolidation: azure-ai-projects now bundles openai and azure-identity as direct dependencies — pip install azure-ai-projects is the only install command you need. No more juggling three packages.
New allow_preview boolean on AIProjectClient constructor replaces per-method foundry_features — opt in once for all preview operations
Preview operations (hosted agents, workflow agents) use the same allow_preview flag; .beta sub-client methods imply it automatically

Breaking changes from 2.0.0b4:

# Before — per-method foundry_features
agent = project_client.agents.create_version(
    model="gpt-5",
    foundry_features=FoundryFeaturesOptInKeys.WORKFLOW_AGENTS_V1_PREVIEW,
)

# After — constructor-level allow_preview
project_client = AIProjectClient(
    endpoint=os.environ["AZURE_AI_PROJECT_ENDPOINT"],
    credential=credential,
    allow_preview=True,  # enables all preview features
)
agent = project_client.agents.create_version(model="gpt-5")

Other renames:

TextResponseFormatConfiguration → TextResponseFormat
CodeInterpreterContainerAuto → AutoCodeInterpreterToolParam (+ new network_policy property)
ImageGenActionEnum → ImageGenAction
Datetime fields across CronTrigger, RecurrenceTrigger, OneTimeTrigger, ScheduleRun changed from str to datetime.datetime

Action: pip install azure-ai-projects==2.0.1 — pin to the stable release. If you were on 2.0.0b4, replace foundry_features with allow_preview=True on the client constructor.

Changelog

.NET

Azure.AI.Projects 2.0.0-beta.2 (Mar 12) The .NET SDK restructured packages — agents administration moved to Azure.AI.Projects.Agents, and Azure.AI.Projects.OpenAI was renamed to Azure.AI.Extensions.OpenAI. OpenAI dependency upgraded to 2.9.1.

.NET 2.0.0 GA shipped April 1 — the first .NET stable release on the v1 REST surface. Major renames: Insights → ProjectInsights, evaluations/memory moved to separate namespaces, AIProjectClient.OpenAI → AIProjectClient.ProjectOpenAIClient, AIProjectClient.Agents → AIProjectClient.AgentAdministrationClient. Action: Upgrade to Azure.AI.Projects 2.0.0 (GA, April 1). Review the breaking changes — significant property and namespace renames.

Changelog

JavaScript / TypeScript

@azure/ai-projects 2.0.0 (Mar 6) + 2.0.1 (Mar 13) First stable release for JS/TS. Breaking changes from 2.0.0-beta.5:

RedTeam.target changed from required to optional
container_app removed from AgentKind; ContainerAppAgentDefinition removed
project.connections.get and .getDefault — includeCredentials moved to options bag
project.beta.evaluators.listLatestVersions → project.beta.evaluators.list

Action: npm install @azure/ai-projects@2.0.1 — pin to stable. The beta → GA migration is mostly renames and options bag changes.

Changelog

Java

azure-ai-projects 2.0.0-beta.2 (Mar 4) → 2.0.0-beta.3 (Mar 19) → 2.0.0 (Mar 27) Three releases in March, culminating in the first Java GA. The beta releases iterated on breaking changes before locking the API surface. Key breaking changes in 2.0.0:

Method renames across all sub-clients for disambiguation (e.g., list() → listDeployments(), get() → getDeployment())
Connection.getCredentials() → Connection.getCredential() (singular)
FoundryFeaturesOptInKeys changed from ExpandableStringEnum to standard Java enum
DatasetsClient.createDatasetWithFolder() throws UncheckedIOException instead of checked IOException
DatasetVersion.getDataUri() → getDataUrl()

Action: mvn dependency:resolve -Dartifact=com.azure:azure-ai-projects:2.0.0 — pin to stable. Review the full changelog for the method rename table.

Changelog

Deprecations

Plan your migrations now — these timelines are firm.

Deprecation	Migration Target	Deadline
PromptFlow (Azure AI Foundry + Azure ML)	Microsoft Framework Workflows	January 2027
Import Data / Data Connections (Azure ML)	Fabric OneLake patterns	Effective now
Low-priority VMs (Azure ML)	Spot VMs	Effective now
Default internet access for new managed VNets	Explicit outbound configuration	Effective March 31, 2026

Action: If you're using PromptFlow in production, start planning your migration to Microsoft Framework Workflows. The January 2027 sunset gives you nine months.

Explore Microsoft Framework Workflows

Resources & Community

[alert type="info" heading="Forrester TEI Study: The Economics of Enterprise AI"]A new Forrester Total Economic Impact study found that organizations using Microsoft Foundry saw 20–30% developer time savings and a sub-6-month payback period. If you're building the business case for standardizing on Foundry, these are the numbers. Read the full study →[/alert]

Discord: Join 50,000+ developers in the Foundry Discord
GitHub Discussions: Ask questions in the forum
RSS: Subscribe to get this digest monthly
Model Mondays: Tune in live on YouTube — Fireworks AI joined on March 23
The Shift podcast: Listen and subscribe for deep dives on agentic AI

Post Updated on April 10, 2026 at 12:33AM
Thanks for reading
from devamazonaws.blogspot.com