TL;DR
- Foundry Agent Service (GA): The next-gen agent runtime is production-ready — Responses API-based, end-to-end private networking, MCP auth expansion (including OAuth passthrough), Voice Live preview, and hosted agents in 6 new regions.
- GPT-5.4 + GPT-5.4 Pro (GA): Production-grade reasoning with integrated computer use, stronger instruction adherence, and dependable multi-step execution. Standard at $2.50/$15 per million tokens; Pro at $30/$180 for deep analytical workloads.
- GPT-5.4 Mini (GA): Cost-efficient small model for classification, extraction, and lightweight tool calls — the high-volume tier in a GPT-5.4 routing strategy.
- Phi-4 Reasoning Vision 15B: Multimodal reasoning meets the Phi family — visual understanding with chain-of-thought for charts, diagrams, and document layouts.
- Evaluations (GA) + Continuous Monitoring: Out-of-the-box and custom evaluators with continuous production monitoring piped into Azure Monitor — quality isn't a pre-ship checkbox anymore, it's a live signal.
- azure-ai-projects SDK (GA): Python 2.0.0, JS/TS 2.0.0, and Java 2.0.0 all shipped stable releases targeting the GA REST v1 surface. .NET 2.0.0 followed on April 1. The
azure-ai-agents dependency is gone — everything lives under AIProjectClient.
- Fireworks AI on Foundry (Preview): High-performance open model inference — DeepSeek V3.2, gpt-oss-120b, Kimi K2.5, and MiniMax M2.5 with bring-your-own-weights support.
- NVIDIA Nemotron Models: Open NVIDIA models now first-class in the Foundry catalog, announced at GTC alongside the Agent Service GA.
- Grok 4.2 (GA): xAI's refreshed chat model graduates from beta.
- Priority Processing (Preview): Dedicated compute lane for latency-sensitive AI workloads — reserved capacity for real-time agents and customer-facing chat.
- Palo Alto Prisma AIRS + Zenity (GA): Third-party runtime security integrations for prompt injection, data leakage, and tool misuse detection.
- Tracing (GA): End-to-end agent trace inspection with sort, filter, and data model refinements.
- PromptFlow Deprecation: Migration to Microsoft Framework Workflows required by January 2027.
Join the community
Connect with 25,000+ developers on
Discord, ask questions in
GitHub Discussions, or
subscribe via RSS to get this digest monthly.
Models
GPT-5.4 + GPT-5.4 Pro (GA)
GPT-5.4 went generally available on March 5 — and this one is about reliability, not raw intelligence. If you've been fighting task drift, mid-workflow failures, and inconsistent tool calling in production agents, GPT-5.4 is designed specifically for those problems. Stronger reasoning over long interactions, better instruction adherence, and integrated computer use capabilities for structured orchestration of tools, files, and data extraction.
GPT-5.4 Pro is the premium variant for when analytical depth matters more than latency — multi-path reasoning evaluation, improved stability across long reasoning chains, and enhanced decision support for scientific research and complex trade-off analysis.
| Model |
Context |
Pricing (per M tokens) |
Best For |
| GPT-5.4 (≤272K input) |
272K |
$2.50 input / $0.25 cached / $15 output |
Production agents, coding, document workflows |
| GPT-5.4 (>272K input) |
Extended |
$5.00 input / $0.50 cached / $22.50 output |
Large-context reasoning |
| GPT-5.4 Pro |
Full |
$30 input / $180 output |
Deep analysis, scientific reasoning |
Deployment: Standard Global and Standard Data Zone (US) at launch, with additional options coming.
Action: Deploy GPT-5.4 from the model catalog. If you're running GPT-5.2 in production, GPT-5.4 is a drop-in upgrade with better instruction following and fewer mid-workflow failures.
GPT-5.4 Mini (GA)
GPT-5.4 Mini shipped on March 17 — OpenAI's latest small model optimized for fast, cost-efficient tasks like classification, extraction, and lightweight tool calls. A step up from GPT-5 mini in instruction following and structured output reliability, at a fraction of the cost of full GPT-5.4. If you're routing by complexity — GPT-5.4 Mini handles the high-volume, low-latency tier while GPT-5.4 takes the reasoning-heavy work.
Phi-4 Reasoning Vision 15B
Phi-4 Reasoning Vision 15B brings multimodal reasoning to the Phi family — a 15B-parameter model that combines visual understanding with chain-of-thought reasoning. Handles charts, diagrams, document layouts, and visual Q&A with strong performance relative to its size.
Grok 4.2 (GA)
Grok 4.2 from xAI graduated to general availability on March 30, following its public beta earlier this year. A refreshed chat model in the catalog — available via serverless or provisioned throughput deployments.
Fireworks AI on Foundry (Public Preview)
Fireworks AI brings high-performance open model inference to Foundry — processing over 13 trillion tokens daily at ~180K requests/second in production. Four models available at launch:
| Model |
Notes |
| DeepSeek V3.2 |
Sparse attention, 128K context |
| gpt-oss-120b |
OpenAI's open-source model |
| Kimi K2.5 |
Moonshot AI's latest |
| MiniMax M2.5 |
New to Foundry with serverless support |
The real story here is
bring-your-own-weights (BYOW) — upload and register quantized or fine-tuned weights from anywhere without changing the serving stack. Deploy via serverless pay-per-token or provisioned throughput.
NVIDIA Nemotron Models
Announced at NVIDIA GTC on March 16,
NVIDIA Nemotron models are now available through the Foundry catalog. Open models on NVIDIA accelerators, joining the widest selection of models on any cloud. Combined with Fireworks AI integration, you can fine-tune Nemotron into low-latency assets distributable to the edge.
OSS Models in NextGen (GA)
Open-source models are now fully integrated into the NextGen Foundry experience at GA — unifying OSS and OpenAI models in a single deployment and management flow. No more context-switching between model providers.
Agents
Foundry Agent Service (GA)
The biggest story of the month. The
next-gen Foundry Agent Service is generally available — built on the OpenAI Responses API, wire-compatible with OpenAI agents, and open to models from DeepSeek, xAI, Meta, LangChain, LangGraph, and more. If you're building with the Responses API today, migrating to Foundry is minimal code changes. What you gain: enterprise security, private networking, Entra RBAC, full tracing, and evaluation — on top of your existing agent logic.
import os
from azure.identity import DefaultAzureCredential
from azure.ai.projects import AIProjectClient
from azure.ai.projects.models import PromptAgentDefinition
with (
DefaultAzureCredential() as credential,
AIProjectClient(endpoint=os.environ["AZURE_AI_PROJECT_ENDPOINT"], credential=credential) as project_client,
project_client.get_openai_client() as openai_client,
):
agent = project_client.agents.create_version(
agent_name="my-enterprise-agent",
definition=PromptAgentDefinition(
model=os.environ["AZURE_AI_MODEL_DEPLOYMENT_NAME"],
instructions="You are a helpful assistant.",
),
)
conversation = openai_client.conversations.create()
response = openai_client.responses.create(
conversation=conversation.id,
input="What are best practices for building AI agents?",
extra_body={"agent_reference": {"name": agent.name, "type": "agent_reference"}},
)
print(response.output_text)
Action: pip install azure-ai-projects — that's it. As of 2.0.0, the package bundles openai and azure-identity as direct dependencies, so you no longer need to install them separately. If you're coming from azure-ai-agents, agents are now first-class operations on AIProjectClient — remove your standalone azure-ai-agents pin.
Voice Live + Foundry Agents (Preview)
Voice Live is a fully managed, real-time speech-to-speech runtime that collapses the traditional STT → LLM → TTS pipeline into a single managed API. Semantic voice activity detection, end-of-turn detection, server-side noise suppression, echo cancellation, and barge-in support — all built-in. Connect Voice Live directly to an existing Foundry agent. The agent's prompt, tools, and configuration stay in Foundry; Voice Live handles the audio pipeline. Voice interactions go through the same agent runtime as text — same evaluators, same traces, same cost visibility.
import asyncio
from azure.ai.voicelive.aio import connect, AgentSessionConfig
from azure.identity.aio import DefaultAzureCredential
async def run():
agent_config: AgentSessionConfig = {
"agent_name": "my-enterprise-agent",
"project_name": "my-foundry-project",
}
async with DefaultAzureCredential() as credential:
async with connect(
endpoint=os.environ["AZURE_VOICELIVE_ENDPOINT"],
credential=credential,
agent_config=agent_config,
) as connection:
await connection.session.update(session=session_config)
async for event in connection:
...
asyncio.run(run())
MCP Authentication Expansion
MCP server connections now support the full authentication spectrum:
| Auth Method |
Use Case |
| Key-based |
Static API tokens via Custom Keys connection |
| Entra Agent Identity |
Service-to-service with managed identity |
| Managed Identity |
Azure resource access |
| OAuth Identity Passthrough |
User-delegated access (OneDrive, Salesforce, SaaS APIs) |
OAuth Identity Passthrough is the standout — when agents need to act on behalf of a specific user, not as a shared system identity:
from azure.ai.projects.models import MCPTool, PromptAgentDefinition
tool = MCPTool(
server_label="github-api",
server_url="https://api.githubcopilot.com/mcp",
require_approval="always",
project_connection_id=os.environ["MCP_PROJECT_CONNECTION_ID"],
)
agent = project_client.agents.create_version(
agent_name="my-mcp-agent",
definition=PromptAgentDefinition(
model=os.environ["AZURE_AI_MODEL_DEPLOYMENT_NAME"],
instructions="Use MCP tools as needed.",
tools=[tool],
),
)
Evaluations (GA) + Continuous Monitoring
Foundry Evaluations shipped GA with three layers:
- Out-of-the-box evaluators — coherence, relevance, groundedness, retrieval quality, safety. No configuration required.
- Custom evaluators — encode your own criteria: business logic, tone standards, domain compliance.
- Continuous evaluation — Foundry samples live traffic automatically, runs your evaluator suite, and surfaces results in Azure Monitor dashboards. Configure alerts for quality drift.
Here's the pattern for running evaluations against an agent target:
eval_object = openai_client.evals.create(
name="Agent Quality Evaluation",
data_source_config=DataSourceConfigCustom(
type="custom",
item_schema={"type": "object", "properties": {"query": {"type": "string"}}, "required": ["query"]},
include_sample_schema=True,
),
testing_criteria=[
{
"type": "azure_ai_evaluator",
"name": "fluency",
"evaluator_name": "builtin.fluency",
"initialization_parameters": {"deployment_name": os.environ["AZURE_AI_MODEL_DEPLOYMENT_NAME"]},
"data_mapping": {"query": "", "response": ""},
},
{
"type": "azure_ai_evaluator",
"name": "task_adherence",
"evaluator_name": "builtin.task_adherence",
"initialization_parameters": {"deployment_name": os.environ["AZURE_AI_MODEL_DEPLOYMENT_NAME"]},
"data_mapping": {"query": "", "response": ""},
},
],
)
run = openai_client.evals.runs.create(
eval_id=eval_object.id,
name=f"Run for {agent.name}",
data_source={
"type": "azure_ai_target_completions",
"source": {
"type": "file_content",
"content": [{"item": {"query": "What is the capital of France?"}}],
},
"input_messages": {
"type": "template",
"template": [{"type": "message", "role": "user", "content": {"type": "input_text", "text": ""}}],
},
"target": {"type": "azure_ai_agent", "name": agent.name, "version": agent.version},
},
)
Prompt Optimizer in Agent Playground
The
Prompt Optimizer is now integrated directly into the Agent playground — iteratively improve prompts before shipping agents. Data-driven prompt tuning connected to evaluation results.
Hosted Agents in 6 New Regions
Hosted agent deployments are now available in
East US, North Central US, Sweden Central, Southeast Asia, Japan East, and more — relevant for data residency requirements and latency optimization.
Safety & Guardrails
Palo Alto Prisma AIRS + Zenity (GA)
Third-party guardrail integrations are now generally available. Register and manage runtime security from
Palo Alto Networks Prisma AIRS and
Zenity directly in Foundry to detect prompt injection, toxic content, malicious URLs, sensitive data leakage, and tool misuse alongside native guardrails.
| Integration |
Detects |
| Palo Alto Prisma AIRS |
Prompt injection, toxic content, malicious URLs, data leakage |
| Zenity |
Prompt injection, tool misuse, data exfiltration |
Announced at NVIDIA GTC alongside the Foundry Agent Service GA.
Task Adherence as Native Guardrail
Task Adherence is now a native guardrail risk type in Foundry — block or annotate off-task tool calls without wiring directly to the standalone Task Adherence API.
Tool-Call & Tool-Response Guardrails
New intervention points in Foundry Guardrails for
tool invocations and tool responses — detect and mitigate risks in what tools are being called and what they return. Reduces data leakage and unsafe tool behavior in agentic workflows.
Agent Mitigations & Guardrail Customization (GA)
Hardened agent mitigations and guardrail customization round out GA-level safety controls for production agents.
Speech, Audio & Avatars
Neural HD TTS Updates + MAI-voice-1
March brings updates to the
Neural HD TTS stack including MAI-voice-1 integration — higher-fidelity, more expressive synthetic speech with improved quality across secondary locales.
Fast Transcription — 5-Hour Support
Fast Transcription now supports up to ~5-hour audio inputs — addressing top customer requests for long-form meeting and media transcription.
Dynamic Vocabulary (GA)
Dynamic vocabulary for English is now GA — improving recognition of domain-specific terms in Teams transcription and related STT workloads. Custom dictionary support (Tier 2) is in public preview for finer pronunciation control.
Custom Photo & Video Avatars in NextGen
Bring
custom photo and video avatars into the Foundry NextGen experience — supporting richer, branded AI presenters for enterprise scenarios.
Playground GAs — TTS, Avatar, STT & Speech Translation
Three playgrounds reached GA this month:
| Playground |
Status |
What It Does |
| TTS Playground |
GA |
Audition voices and parameters before integrating |
| Avatar Playground |
GA |
Preview and tune avatar experiences in NextGen |
| STT & Speech Translation Playground |
GA |
Trial STT and translation models |
Platform
Priority Processing (Preview)
Priority Processing gives you a dedicated compute lane for latency-sensitive and business-critical AI workloads. When you need guaranteed low-latency inference — real-time agents, customer-facing chat, time-sensitive pipelines — Priority Processing routes your requests through reserved capacity instead of competing with standard traffic. Available for OpenAI models in Foundry with configurable priority tiers per deployment.
Tracing (GA)
Tracing is finalized for GA — UX polish, data model refinements, and sort/filter capabilities so you can reliably inspect agent traces in production. New OTel semantics for AI workloads (memory, state, planning) improve interoperability across tooling.
End-to-End Private Networking
Foundry Agent Service now supports
Standard Setup with private networking — BYO VNet, no public egress, container/subnet injection. Extended to tool connectivity: MCP servers, Azure AI Search indexes, and Fabric data agents all operate over private network paths. Managed VNET logging adds firewall/NSG/flow logs for visibility into isolated environments.
Foundry Control Plane ARM API
A consolidated
Foundry Control Plane ARM API gives enterprises a unified way to manage agents, models, and tools via ARM. Public preview of FCP support for
Azure Functions and App Service lets you govern function-hosted agents centrally.
Fine-Tuning CLI
A new
CLI for configuring, submitting, and monitoring fine-tuning jobs — a faster, code-first way to iterate on custom models. Cost estimation based on token projections helps plan spend before running large training jobs.
Platform Updates Rollup
- Eval results ↔ agent trace linking — evaluation results now connect to the underlying agent trace, closing a key observability gap for debugging.
- Local evals aligned with Evaluators catalog — run local evaluations with the same primitives as hosted runs, no hardcoded SDK logic.
- PII NextGen playgrounds — conversational and document PII detection with updated configuration panels exposing preview features.
- Notification center — tenant-level notifications (not just project-scoped), plus email delivery for critical eval, safety, and deployment alerts.
- Free Trial & PAYG — Free Trial as default sign-up path, in-app PAYG subscription creation, and in-app trial start to reduce friction.
- CMK for Azure AI Search — service-level customer-managed key configuration so admins set encryption defaults once, not per-index.
SDK & Language Changelog (March 2026)
March was the SDK GA month. The Foundry REST API went GA in February, and this month the SDKs followed —
Python, JS/TS, and Java all shipped stable 2.0.0 releases targeting the v1 REST surface. .NET 2.0.0 shipped April 1. The
azure-ai-agents dependency is gone across all languages; agents, evals, memory, and inference all live under the unified
AIProjectClient.
Python
azure-ai-projects 2.0.0 (Mar 6) + 2.0.1 (Mar 12) First stable release. This is the one to pin for production.
Features:
- Dependency consolidation:
azure-ai-projects now bundles openai and azure-identity as direct dependencies — pip install azure-ai-projects is the only install command you need. No more juggling three packages.
- New
allow_preview boolean on AIProjectClient constructor replaces per-method foundry_features — opt in once for all preview operations
- Preview operations (hosted agents, workflow agents) use the same
allow_preview flag; .beta sub-client methods imply it automatically
Breaking changes from 2.0.0b4:
# Before — per-method foundry_features
agent = project_client.agents.create_version(
model="gpt-5",
foundry_features=FoundryFeaturesOptInKeys.WORKFLOW_AGENTS_V1_PREVIEW,
)
# After — constructor-level allow_preview
project_client = AIProjectClient(
endpoint=os.environ["AZURE_AI_PROJECT_ENDPOINT"],
credential=credential,
allow_preview=True, # enables all preview features
)
agent = project_client.agents.create_version(model="gpt-5")
Other renames:
TextResponseFormatConfiguration → TextResponseFormat
CodeInterpreterContainerAuto → AutoCodeInterpreterToolParam (+ new network_policy property)
ImageGenActionEnum → ImageGenAction
- Datetime fields across
CronTrigger, RecurrenceTrigger, OneTimeTrigger, ScheduleRun changed from str to datetime.datetime
Action: pip install azure-ai-projects==2.0.1 — pin to the stable release. If you were on 2.0.0b4, replace foundry_features with allow_preview=True on the client constructor.
Changelog
.NET
Azure.AI.Projects 2.0.0-beta.2 (Mar 12) The .NET SDK restructured packages — agents administration moved to
Azure.AI.Projects.Agents, and
Azure.AI.Projects.OpenAI was renamed to
Azure.AI.Extensions.OpenAI. OpenAI dependency upgraded to 2.9.1.
.NET 2.0.0 GA shipped April 1 — the first .NET stable release on the v1 REST surface. Major renames: Insights → ProjectInsights, evaluations/memory moved to separate namespaces, AIProjectClient.OpenAI → AIProjectClient.ProjectOpenAIClient, AIProjectClient.Agents → AIProjectClient.AgentAdministrationClient. Action: Upgrade to Azure.AI.Projects 2.0.0 (GA, April 1). Review the breaking changes — significant property and namespace renames.
Changelog
JavaScript / TypeScript
@azure/ai-projects 2.0.0 (Mar 6) + 2.0.1 (Mar 13) First stable release for JS/TS.
Breaking changes from 2.0.0-beta.5:
RedTeam.target changed from required to optional
container_app removed from AgentKind; ContainerAppAgentDefinition removed
project.connections.get and .getDefault — includeCredentials moved to options bag
project.beta.evaluators.listLatestVersions → project.beta.evaluators.list
Action: npm install @azure/ai-projects@2.0.1 — pin to stable. The beta → GA migration is mostly renames and options bag changes.
Changelog
Java
azure-ai-projects 2.0.0-beta.2 (Mar 4) → 2.0.0-beta.3 (Mar 19) → 2.0.0 (Mar 27) Three releases in March, culminating in the first Java GA. The beta releases iterated on breaking changes before locking the API surface.
Key breaking changes in 2.0.0:
- Method renames across all sub-clients for disambiguation (e.g.,
list() → listDeployments(), get() → getDeployment())
Connection.getCredentials() → Connection.getCredential() (singular)
FoundryFeaturesOptInKeys changed from ExpandableStringEnum to standard Java enum
DatasetsClient.createDatasetWithFolder() throws UncheckedIOException instead of checked IOException
DatasetVersion.getDataUri() → getDataUrl()
Action: mvn dependency:resolve -Dartifact=com.azure:azure-ai-projects:2.0.0 — pin to stable. Review the full changelog for the method rename table.
Changelog
Deprecations
Plan your migrations now — these timelines are firm.
| Deprecation |
Migration Target |
Deadline |
| PromptFlow (Azure AI Foundry + Azure ML) |
Microsoft Framework Workflows |
January 2027 |
| Import Data / Data Connections (Azure ML) |
Fabric OneLake patterns |
Effective now |
| Low-priority VMs (Azure ML) |
Spot VMs |
Effective now |
| Default internet access for new managed VNets |
Explicit outbound configuration |
Effective March 31, 2026 |
Action: If you're using PromptFlow in production, start planning your migration to Microsoft Framework Workflows. The January 2027 sunset gives you nine months.
Resources & Community
[alert type="info" heading="Forrester TEI Study: The Economics of Enterprise AI"]A new Forrester Total Economic Impact study found that organizations using Microsoft Foundry saw
20–30% developer time savings and a
sub-6-month payback period. If you're building the business case for standardizing on Foundry, these are the numbers.
Read the full study →[/alert]
Post Updated on April 10, 2026 at 12:33AM
Thanks for reading
from devamazonaws.blogspot.com
Comments
Post a Comment