Posts

[MS] How did we conclude that CcNamespace.dll was the ringleader of a group of DLLs that unloaded prematurely? - devamazonaws.blogspot.com

When I presented my study of a crash caused by a thread executing from an unloaded third-party DLL , someone asked how I concluded that CcNamespace.dll was the ringleader of the family of related DLLs. The list of recently-unloaded DLLs is recorded in a circular history.¹ So when you see a bunch of DLLs listed in a row, they were unloaded one after the other.² 00007ff9`6d7c0000 00007ff9`6d80a000 FabrikamContextMenu.dll 00007ff9`115e0000 00007ff9`1172f000 LitWareSync.dll 00007ff9`643d0000 00007ff9`64681000 CcNamespace.dll 00000000`55440000 00000000`5550b000 LibDB_CloudNs_3.dll 00000000`55860000 00000000`55998000 LibNet_CloudNs_3.dll 00000000`557f0000 00000000`5585b000 LibJson_CloudNs_3.dll 00000000`55510000 00000000`557e7000 LibUtils_CloudNs_3.dll 00000000`561a0000 00000000`56238000 MSVCP100.dll 00000000`56240000 00000000`56312000 MSVCR100.dll 00007ff9`85130000 00007ff9`85167000 EhStorShell.dll 00007ff9`3cac0000 00007ff9`3cb61000 wpdshext.dll 00007ff9`78a00000 ...

[MS] Enabling MLflow OpenAI Autolog on PySpark Workers - devamazonaws.blogspot.com

Context In a recent engagement, the team built an LLM-based contract intelligence pipeline on Azure Databricks. The goal was to extract entitlements from a large corpus of inconsistently formatted service-contract PDFs — what is covered, on which equipment, and under which terms — so downstream systems can tell what is in scope and what is billable. Rules or template-based extraction were not a realistic option given the variability in layout and wording across contracts, which made an LLM a good fit: it can absorb that variability, reason about context across a document, and emit structured output in a single pass. To parallelize the extraction, the pipeline fans those per-document LLM calls out across Spark workers. Per-call visibility into token spend, latency, and prompt/response quality becomes essential to keep cost and output quality from drifting unnoticed. The natural tool for that is mlflow.openai.autolog() . The catch: getting it to reliably emit traces in this setup ta...

Amazon SageMaker HyperPod now supports AMI versioning and auto-patching - devamazonaws.blogspot.com

Amazon SageMaker HyperPod now gives you visibility into the Amazon Machine Image (AMI) versions running across your clusters and automatically applies security patches without disrupting your workloads. SageMaker HyperPod is purpose-built infrastructure for training and deploying foundation models at scale. Cluster administrators previously had limited insight into which AMI versions were running, making drift hard to detect and security patching a manual, reactive process that was difficult to run on long multi-day training jobs and that risked changing bundled software in the AMI such as NVIDIA drivers or CUDA. These new capabilities on HyperPod help you keep clusters secure and consistent while removing the operational burden of manual patching. With AMI versioning, you can see the exact AMI version on every instance group and node in the semantic versioning (major.minor.patch) format, quickly detect version drift, and roll back to a previous version—including the prior NVIDIA dri...

Amazon SageMaker Unified Studio now supports Terraform for provisioning - devamazonaws.blogspot.com

Amazon SageMaker Unified Studio now supports Terraform for provisioning. Customers can use the open-source terraform-aws-sagemaker-unified-studio module to deploy a SageMaker Unified Studio domain through version-controlled templates. With this launch, platform teams can bring SageMaker Unified Studio into their existing infrastructure-as-code pipelines, maintaining consistency across development, staging, and production accounts. Amazon SageMaker Unified Studio is a unified development environment where data teams can build end-to-end data and AI workflows using familiar tools—from data integration and analytics to machine learning and generative AI—all governed by a shared catalog. Administrators provision domains to give their organization a single, managed workspace with built-in access control, data governance, and cross-service connectivity. With this launch, the Terraform module handles the infrastructure of SageMaker Unified Studio domain with provisioned IAM roles. Sub-...

Amazon ECS now provides real-time deployment observability in the AWS Management Console - devamazonaws.blogspot.com

Amazon Elastic Container Service  (Amazon ECS) now provides real-time deployment observability in the Amazon ECS Console. With this launch, customers can track deployment progress, monitor deployment health, and diagnose failures directly from the console, and understand exactly what is happening during a deployment, identify issues as they occur, and reduce the time it takes to troubleshoot and resolve deployment failures. The enhanced deployment observability introduces a live deployment timeline that shows each phase, service events, and task launch and termination progress with automatic refresh. You can monitor deployment health in real time using circuit breaker status with live task failure proximity and threshold tracking, deployment alarm state, and health checks at both the container and load-balancer level. To diagnose deployment failures faster, you can view failed tasks directly in the deployment timeline with diagnostic context and deep links to related services suc...

[MS] Upcoming Change: NTLM Removal in Git (libcurl) – Impact to Azure DevOps Server Customers - devamazonaws.blogspot.com

Image
Overview In September 2026, NTLM support will be removed from libcurl, which is used by Git for HTTP(S) operations. As a result, Git operations over HTTPS against Azure DevOps Server (on-premises) will stop working for customers who rely on NTLM authentication. This change is part of a broader industry move toward more secure authentication mechanisms. Many environments may be affected even if they believe they are using Kerberos. This is because Negotiate (SPNEGO) authentication can silently fall back to NTLM when Kerberos is not properly configured, leading to unintentional dependency on NTLM. If your environment currently depends on NTLM authentication, you will need to transition to a supported alternative before it is removed. Based on current guidance, customers should move to Kerberos authentication wherever possible and avoid continued reliance on NTLM, as it is deprecated and will not be supported going forward. While older Git client versions may temporarily continue to...

Amazon ECS now supports configurable deployment circuit breaker settings - devamazonaws.blogspot.com

Amazon Elastic Container Service (Amazon ECS) now gives you more control over when a service deployment is considered failed and automatically rolled back. You can now customize deployment circuit breaker settings to match your application's startup behavior, deployment needs, and tolerance for task failures, so rollback works the way you need across different applications and environments. The ECS deployment circuit breaker automatically detects failed deployments and rolls them back to the last successful deployment once a failure threshold is reached. With this launch, you can set the deployment circuit breaker threshold using either a fixed task failure count or a percentage of your service's desired task count, and choose how failures are counted using either a consecutive model, where the counter resets when a healthy task starts, or a cumulative model, where failures keep adding up throughout the deployment. For example, you can set lower thresholds for faster rollbac...