Enterprise AI Agent Environment Design Notes Part 1: Comparing the Three Major Clouds and Designing Your Architecture

First Published:
Last Updated:

This article is Part 1 of a three-part series on designing enterprise AI agent environments.

Disclaimer: This article reflects the author's personal research and synthesis of publicly available information as of 2025–2026. The architectural recommendations presented here represent the author's own generalized perspective and are not definitive solutions. Every organization has different internal circumstances, use cases, and existing system configurations, so the approaches described here may not be the right fit for every environment. Cloud service specifications, pricing, and features change frequently, and the author's interpretations may contain errors. When evaluating these options for your organization, always refer to each cloud provider's official documentation and make decisions based on your specific situation. If you spot any errors, please feel free to point them out.

1. Introduction — Why Enterprise AI Agent Environments, and Why Now?

Between late 2025 and 2026, the way organizations use AI shifted dramatically. After the initial phases of "experimenting with ChatGPT" and "running a RAG proof of concept," companies are now moving toward building and operating AI agents as core production infrastructure.

RAG chatbots that search internal documents, automated customer support, workflow automation — these aren't one-and-done projects. They become internal infrastructure used daily by hundreds or thousands of employees. At that scale, simply calling an LLM API is no longer enough. You need end-to-end system design covering authentication, access control, audit logging, cost management, and operational monitoring.

But when organizations try to move from experimentation to full-scale deployment, internal systems teams quickly face a cascade of decisions:
  • Cloud selection: Should the primary platform be Azure, AWS, or Google Cloud?
  • Access control: How do you carry over internal document ACLs (access control lists) to AI agents?
  • Agent deployment: When should you use local agents vs. cloud-hosted agents?
  • Multi-cloud: Should you commit to one provider, or combine multiple clouds?
  • Avoiding lock-in: How do you preserve the ability to migrate in the future?

This series takes organizations that use Microsoft 365 (MS365) as their internal foundation as the primary audience, and provides a comprehensive comparison of the three major platforms: Microsoft Foundry (formerly Azure AI Foundry), Amazon Bedrock AgentCore, and Google Cloud Vertex AI. Rather than a simple feature checklist, this is a practical design reference that goes deep on architectural principles, differences in ACL implementation depth, TCO estimates, and strategies for reducing vendor lock-in.

1.1 Key Conclusions Across All Three Parts (Preview)

Here are the conclusions upfront. The details will unfold throughout the series.
DimensionConclusion
SharePoint ACL integrationAzure is the most practical choice today. Foundry Path A (full inheritance) / Path B (Entra SG only) provide platform-level ACL support. Google Cloud can handle it with additional configuration, though ACL sync has latency (via Gemini Enterprise). AWS has no ACL mechanism.
LLM choice flexibilityAll three providers are well-stocked. Azure offers 11,000+ models; AWS has Claude/Llama/Nova and more; GCP has 200+ (Gemini/Claude/Llama).
Cost efficiencyFor high-frequency inference, GCP (Gemini 2.5 Flash-Lite from ~$0.10/million tokens) is the clear winner. Azure pay-as-you-go is solid for RAG infrastructure.
MultimodalGCP is the most comprehensive (Gemini Live API, Veo, Imagen, Chirp).
GovernanceAzure (Purview/Entra/Defender integration) is the strongest fit for MS365 environments.
Architecture recommendationAzure-led with GCP as a complement (for multimodal and high-frequency inference) tends to be the best-balanced approach.

2. Local AI Agents vs. Cloud-Hosted AI Agents — Getting the Full Picture

Before diving into AI agent environment design, it's worth stepping back to ask: what kinds of AI agents exist, and how should you choose between them? Skipping this step often leads to misconceptions like "can't we just roll out a local tool to all employees?" or "can't a cloud service handle everything?"

2.1 Why You Need Both

Local agents like Claude Code and GitHub Copilot, and cloud-hosted agents like Amazon Bedrock AgentCore, Microsoft Foundry, and Vertex AI Agent Builder are not competing alternatives — they're complementary.

The emergence of MCP (Model Context Protocol) servers and Skills has rapidly expanded what local agents can do. They can now post to Slack, query databases, and call APIs directly from your local machine. But that doesn't make cloud-hosted agents obsolete. The reason is fundamental: the two serve different user groups and follow different operational models.

2.2 Where Local Agents Excel

Local agents are excellent personal productivity tools for technical users who want to maximize their individual output.
  • Developer productivity: Code editing, terminal operations, direct filesystem access
  • Low latency: Immediate access to local resources with no network round-trips
  • Extensibility via MCP/Skills: Connections to Slack, databases, APIs, and more from the local environment
  • Privacy: Data stays local, making them well-suited for sensitive work

2.3 Why Cloud-Hosted Agents Are Indispensable

Delivering AI benefits to an entire organization requires cloud-hosted agents. The table below shows why cloud-hosted agents have the edge for each organizational concern.
DimensionWhy cloud-hosted agents have the advantage
Multi-user managementSetting up and managing local agents for hundreds or thousands of employees is operationally prohibitive. Cloud-hosted agents can be provisioned centrally.
Access control and auditingCentralized logging of who accessed what, RBAC, and compliance requirements are far more efficiently managed server-side.
Deployment to non-technical usersSales, legal, and HR teams won't realistically use terminal-based tools. They need a UI accessible from a browser or Microsoft Teams.
Shared knowledge baseRAG pipelines (embedding and searching internal documents) maintain better data consistency when managed centrally on the server side.
Always-on and event-drivenAutomations like "classify incoming emails" or "answer questions posted in Slack" can't depend on a local PC being turned on.
ScalabilityOrchestrating multiple agents and managing complex workflows requires the scaling capabilities of the cloud.

2.4 A Three-Tier Agent Deployment Model by User Type

Putting this together, a rational design for an organization's AI agent environment is a three-tier structure:
Three-tier agent deployment model
Three-tier agent deployment model

The key insight is not to give everyone the same tool, but to choose the right interface for each user tier. Give developers the flexibility of local agents. Give general staff the accessibility of cloud-hosted interfaces. Give background processes an event-driven automation foundation. This differentiation is what determines how much value AI actually delivers across the organization.

3. Designing the Architecture for an Enterprise AI Agent Deployment

Now that we have the big picture, let's get into the specifics of architecture design. The question is: where do you start, and in what order do you build? I'll break this down into three steps.

3.1 Where Internal Resources Live and How to Access Them

First, let's establish the baseline: where are the internal resources your AI agents will use, and how will agents access them? Most organizations will recognize the following patterns.
Internal ResourceTypical StorageHow AI Agents Access It
Proposals, contracts, policiesS3 / SharePoint / GCSRAG (embedding → search engine/vector DB → semantic search)
FAQsDirectly loaded into search engine/vector DBSemantic search
Internal wikiConfluence, etc.Via API or crawl → RAG
Business data (customers, sales, etc.)RDS / DynamoDB / BigQueryMCP Server → SQL/API
Email and chat historyExisting SaaSAPI integration (access control required)

The important thing here is not to try to handle everything with a single approach. Documents go through RAG. Structured data goes through SQL/API. Real-time communication data goes through API integration. Choose the access method to match the nature of the resource.

3.2 A Phased Deployment Approach

Rather than building everything at once, the standard approach is to progress through three steps incrementally. Trying to build a "full stack" solution in the PoC phase doesn't just slow down delivery — it makes it nearly impossible to isolate what's working from what isn't.
Step 1 — Build the RAG Foundation
The right place to start is a RAG pipeline for internal documents. Chunk internal documents, embed them as vectors, and store them in a search engine (Azure AI Search, Amazon Kendra, Vertex AI Search, etc.) or vector database (Amazon OpenSearch Serverless, etc.). This becomes the shared knowledge layer for all agents.

The standard practice is to start with content that's easy to measure impact from and has lower sensitivity — FAQs and internal manuals are ideal first targets.
Step 2 — Build Two Access Paths
Once the RAG foundation is running, build two access paths to it.
  • Cloud-hosted UI: Use Bedrock AgentCore, Microsoft Foundry, or Vertex AI Agent Builder to create a chat interface for general staff connected to the Step 1 RAG foundation.
  • MCP Server: For developers and power users. Establish a path for local agents to access the same RAG foundation and APIs.

Providing multiple access paths to the same underlying knowledge base lets you offer the right interface to each user tier while keeping the data consistent across all of them.
Step 3 — Centralize Permission Management
Finally, get permissions in order. Use IAM, Entra ID, Workforce Identity Federation, or equivalent systems to apply the same access policies to agent-mediated access as to direct access. Ensuring that AI agents can't violate existing access policies — "contracts are legal-only," "payroll data is HR-only" — is a prerequisite for rolling out to the broader organization.

Permission control is one of the central themes of this series, and will be covered in depth in Part 2.

3.3 Decision Criteria for Common Design Questions

Here are the points where early-stage deployments tend to get stuck, along with guidance on how to think through them.
Decision pointGuidance
Bedrock vs. Foundry vs. Vertex AI vs. self-hostedAlign with your existing cloud infrastructure. In MS365 environments, prioritize Azure.
RAG vs. fine-tuningStart with RAG. Only consider fine-tuning if accuracy in a specific domain proves insufficient.
Bulk document ingestion vs. phased ingestionPhased. Start with FAQs, then policies, then proposals — expand the scope incrementally while measuring impact.
One all-purpose agent vs. a fleet of specialized agentsSpecialized agents. Separating "contract review," "FAQ answering," and "proposal drafting support" gives you better accuracy and cleaner permission management.

The last point deserves emphasis. An all-purpose agent may seem convenient at first glance, but it creates complex permission management and bloated prompts. Separating agents by use case tends to be more manageable over the long run.

4. Overview of the Three Major Cloud AI Agent Platforms

Now for the main event. Let's look at what each of Azure, AWS, and Google Cloud offers in terms of AI agent platforms. As you read, think not just about what each platform can do, but about which type of organization it's best suited for.

4.1 Microsoft Foundry (Azure)

Formerly Azure AI Studio → Azure AI Foundry → Microsoft Foundry (renamed at Ignite in November 2025). Positioned as an "AI app and agent factory," it stands alongside Microsoft 365 and Microsoft Fabric as the third pillar of Microsoft's enterprise platform.
Core Components
Microsoft Foundry is built around five core components. Foundry IQ and the Foundry Control Plane are particularly notable as enterprise differentiators.
ComponentRole
Foundry Agent ServiceManaged runtime for agents. Also supports OSS frameworks like LangGraph and CrewAI.
Foundry Models (Model Router)Unified endpoint for 11,000+ models. Supports automatic cost/performance routing.
Foundry IQNext-generation RAG knowledge system (Public Preview). Backed by Azure AI Search, with connectors for OneLake, SharePoint, Azure Blob Storage, and more. Via OneLake shortcuts, it can also reference data in S3 and Snowflake.
Foundry ToolsAgent integration for Azure AI Services (speech, video, image, document, text).
Foundry Control PlaneRBAC, VNET isolation, Purview integration, tracing, and evaluation capabilities.
Available LLMs
ProviderKey Models
AnthropicClaude Opus 4.6, Claude Sonnet 4.6, Claude Haiku 4.5
MetaLlama 4 Scout, Llama 4 Maverick, Llama 3.1-405B
Mistral AIMistral Large 3, Mistral Small, Codestral
DeepSeekDeepSeek-R1, DeepSeek-V3
OpenAIGPT-5.4, GPT-4o, o3, o4-mini
OthersCohere, Stability AI, NVIDIA, Hugging Face, etc.
Key Strengths in MS365 Environments
Foundry's biggest differentiator is its native integration with the MS365 ecosystem. Where other clouds require configuration to connect, Azure starts out connected.
  • Entra ID, SharePoint, Teams, and Microsoft Fabric are all part of the same ecosystem
  • Automatic recognition of SharePoint ACLs
  • Integration with Microsoft Copilot Studio
  • Workflow automation via Power Automate / Logic Apps (1,400+ connectors)

4.2 Amazon Bedrock AgentCore (AWS)

GA'd in October 2025. This is a managed platform that unifies the building, deployment, and operation of agents. True to AWS's character, it follows a "building blocks" design philosophy — you combine AgentCore components with related Bedrock capabilities to assemble what you need.
Core Components
ComponentRole
AgentCore RuntimeManaged execution environment for agents. Supports long-running sessions up to 8 hours.
AgentCore IdentityFederated authentication with external IdPs (including Entra ID). JWT validation.
AgentCore GatewayMCP-compatible tool connection gateway. Managed hosting with IAM/OAuth authentication.
AgentCore MemoryShort-term and long-term memory management for agents.
AgentCore PolicyPolicy controls including content filtering, topic restrictions, and PII detection.
AgentCore EvaluationsAgent quality evaluation (helpfulness, tool selection accuracy, hallucination rate, etc.).
Knowledge BasesRAG pipelines from S3, SharePoint, and other sources. Vector DB integration.

※ Knowledge Bases is a separate Bedrock capability from AgentCore, but it's commonly used together when building agents.
Available LLMs
ProviderKey Models
AnthropicClaude Opus 4.6, Claude Sonnet 4.6, Claude Haiku 4.5
MetaLlama 4 Scout, Llama 4 Maverick
Mistral AIMistral Large 3
AmazonAmazon Nova Premier, Nova Pro, Nova Lite, Nova Micro
AI21 LabsJamba 1.5 Large, Jamba 1.5 Mini
CohereCommand R+
Notable Points
A major advantage of Bedrock AgentCore is framework flexibility. You can bring in any OSS framework you like — LangGraph, CrewAI, OpenAI Agents SDK, and so on. AgentCore Identity also supports OIDC federation with external IdPs like Entra ID and Okta.

However, SharePoint ACL is not supported at the platform level. This is a significant constraint when building internal document RAG in an MS365 environment. Details are covered in Part 2.

4.3 Vertex AI Agent Builder / ADK (Google Cloud)

Vertex AI Agent Builder is an integrated platform for building, deploying, and managing enterprise AI agents. The Agent Development Kit (ADK) serves as the core framework, and notably, it's published as open source.
Core Components
ComponentRole
Agent Development Kit (ADK)OSS agent framework. Supports Python, Java, Go, and TypeScript.
Agent EngineProduction-grade managed runtime. Automatic scaling, short-term and long-term memory.
Vertex AI SearchFully managed search engine. Connectors for SharePoint, Confluence, Slack, and more.
Model Garden200+ models (Gemini, Claude, Llama, etc.)
Model ArmorBlocks prompt injection attacks.
Agent IdentityRegisters agents as Cloud IAM principals. Enforces the principle of least privilege.
Gemini Enterprise (formerly Google Agentspace)Integrated enterprise search and agent platform.
Available LLMs
ProviderKey Models
GoogleGemini 3.1 Pro (Preview), Gemini 3 Flash (Preview), Gemini 2.5 Pro/Flash, Imagen, Veo
AnthropicClaude Opus 4.6, Claude Sonnet 4.6, Claude Haiku 4.5
MetaLlama 4 Maverick, Llama 4 Scout
Mistral AICodestral
OthersAI21 Labs, Gemma, PaliGemma, etc.
Google Cloud's Unique Strengths
Where Google Cloud differentiates itself most from the other two is in multimodal capability and cost efficiency.
CapabilityDescription
Google Search GroundingGrounds responses in real-time web search results. Effective for reducing hallucinations. A unique capability not available on other clouds.
Web Grounding for EnterpriseFor regulated industries (finance, healthcare, public sector). VPC-SC integration and log control.
MultimodalThe broadest range: Gemini 3.1 Pro / 3 Flash / 2.5 Pro / 2.5 Flash (up to 1M token context), Gemini Live API (bidirectional audio/video), Veo (video generation), Imagen (image generation), and Chirp (speech).
BigQuery integrationTight coupling between large-scale data analytics and AI agents.
VPC Service ControlsFree data exfiltration protection (equivalent to Azure/AWS Private Endpoints, plus additional safeguards).
A2A ProtocolAdvocates for an open standard for agent-to-agent communication (now transferred to the Linux Foundation).

4.4 Side-by-Side Comparison of All Three Platforms

Here's a consolidated comparison table of the three platforms. Use it to check where each platform is strong or weak relative to your organization's requirements.
Evaluation AxisMicrosoft FoundryAWS Bedrock AgentCoreGCP Vertex AI
Model count11,000+Dozens (covers major models)200+
Agent frameworkSemantic Kernel + OSSAny OSS supportedADK (OSS)
SharePoint integration◎ Native△ Connector (no ACL)○ Via Gemini Enterprise
Entra ID integration◎ Native○ AgentCore Identity○ Workforce Identity Federation
Multimodal○ GPT-5.4○ Nova◎ Gemini (most comprehensive)
Network isolationVNETVPCVPC-SC (free)
Teams integration✕ Custom implementation✕ Custom implementation
Pricing modelPay-as-you-go + ProvisionedPay-as-you-go + Provisioned ThroughputPay-as-you-go + CUD (up to 55% off)

How to read this table: Having more ◎ marks doesn't make a platform "the best." What matters is whether it aligns with your organization's requirements. If you're on MS365 and SharePoint permission control is critical, Azure is the natural choice. If multimodal capability or cost optimization is the priority, GCP stands out. If integrating with existing AWS infrastructure or leveraging Claude is key, AWS is the right call. Each platform has its own home turf.

4.5 Sidebar — "Can't we just stand up Azure AD inside an AWS/GCP VPC?"

A natural follow-up question when looking at the table above is: "If Entra ID integration is the decisive factor, can't we just deploy Azure AD inside an AWS or GCP VPC and sync the Entra ID content there? Wouldn't that close the gap?" The short answer is no, and it's worth clarifying why — because the premise of the question is technically incorrect:
  1. Entra ID is a global SaaS IDP and cannot be installed in any VPC (not even an Azure VNET). What can be placed inside an AWS/GCP VPC is a classical AD DS — self-hosted Windows Server AD, AWS Managed Microsoft AD, or Google Cloud Managed Service for Microsoft AD. Microsoft Entra Domain Services (the managed AD DS offering) is Azure-only.
  2. SharePoint ACLs do not live in AD DS. They live in Entra ID + SharePoint Online, together with Microsoft 365 security groups and Purview sensitivity labels. A VPC-resident AD DS — no matter how aggressively synced — will not carry any of that metadata.
  3. The identity-layer affinity is already solved without a VPC AD. AWS Bedrock AgentCore Identity and GCP Workforce Identity Federation already federate directly with Entra ID over OIDC/SAML. The gap between Foundry and the other two is at the connector / ingestion layer (how SharePoint ACLs are propagated into the retrieval index), which a VPC AD cannot influence.

In other words, the ◎/○/△ marks in the "SharePoint integration" and "Entra ID integration" rows above reflect architectural realities that won't change by moving a directory service into a VPC. Part 2 revisits this in more operational detail.

5. Part 1 Wrap-Up — What to Read Next

In Part 1, we covered the overall landscape of enterprise AI agent environments, the three-tier agent deployment model, a phased deployment approach, and a comparison of the three major cloud platforms.

By now, you should have a solid sense of what each cloud excels at. But the most important and most challenging question — how to carry over internal document access control (ACL) to AI agents — hasn't been addressed yet.

Part 2: Implementing SharePoint ACL and Access Control tackles that question head-on. We'll go deep on the differences in ACL support across clouds, the details of Microsoft Foundry's Path A and Path B, and strategies for handling mixed-license environments — all at the implementation level.

6. Related Articles in This Series


This article reflects the author's personal research and synthesis of publicly available information as of 2025–2026, and represents a generalized architectural perspective. Every organization has different internal circumstances and use cases, so the content here does not apply universally. The author's interpretations and assumptions may contain errors. When evaluating for your own organization, always consult each provider's official documentation and make decisions based on your specific context.

References:
Tech Blog with curated related content

Written by Hidekazu Konishi