AWS Zero-Trust Network Architecture Guide - Segmentation, Inspection, and Identity-Aware Access with VPC, Network Firewall, VPC Lattice, and Verified Access

First Published:
Last Updated:

Zero trust is easy to state and hard to build. The principle — "never trust, always verify" — fits on a slide, but turning it into a running AWS architecture means combining services that most teams operate in isolation: account and VPC segmentation, traffic inspection, identity-aware service-to-service connectivity, and identity-aware human access to internal applications, all watched by a detection layer. The single sentence that organizes this whole guide is blunt: network reachability is not permission. Being able to send a packet to a service must not, by itself, authorize the request.

This is a Level 400 implementation guide, not a service tour or a selection matrix. It defines one named reference architecture — a multi-account zero-trust network — and walks a single access request through every layer that must independently say "yes" before traffic reaches data. The deep per-service mechanics and the "which connectivity option should I pick" decisions live in existing guides, and this article hands off to them rather than rewriting them. What you get here is the assembly: how the pieces fit, how one request flows through them, how the layers fail, and how you diagnose which layer denied a request.

For the per-service depth that this guide deliberately omits, see the AWS VPC Lattice Complete Guide for service-to-service networking mechanics, the AWS PrivateLink and VPC Endpoints Complete Guide for private service exposure, the AWS VPC Connectivity Decision Guide for choosing among connectivity models, and the IAM Policy Evaluation Logic Step by Step for how layered policies are evaluated. The AWS Networking Glossary defines every term used here.

Scope and honesty notes. This is an architecture guide, not a cost guide: inspection, Lattice, and Verified Access all have meaningful cost characteristics, but pricing changes frequently and is per-Region, so this article describes cost qualitatively and links to official pricing. It is also not an attack playbook — it describes controls and how to verify them, not how to bypass them. Crucially, passing one control is not the same as being safe. Zero trust is layered precisely because any single layer can be misconfigured; the architecture's value is that several independent layers must each authorize a request. All quantitative limits mentioned were verified against AWS documentation at the time of writing; always confirm current quotas in the Service Quotas console for your account and Region.

1. Introduction: Reachability Is Not Permission

The perimeter model assumes that anything inside the network is trusted. Zero trust discards that assumption. AWS frames zero trust architecture (ZTA) in its Prescriptive Guidance around a small set of principles: verify and authenticate every request (ideally on each request, with strong MFA and contextual signals), grant least privilege, segment the network into small authorized flows (micro-segmentation), monitor continuously, automate, and — the linchpin — require each resource access to be explicitly authorized at an enforcement point that considers both identity and context.

A common misreading is that zero trust means "identity only — throw away the network." AWS is explicit that this is not the position. Identity-centric controls and network-centric controls are complementary; you apply each where it adds the most value and combine them. Network controls still do essential work: they shrink the blast radius of a compromised credential, give you a place to inspect traffic for threats, and contain lateral movement. Identity controls ensure that being on the network is never sufficient. The interesting, distinctly Level 400 part of an AWS zero-trust design is the intersection — the places where reachability and identity are evaluated together for the same request.

That intersection is where this guide spends most of its time. Two AWS services sit exactly on it:
  • Amazon VPC Lattice evaluates, for service-to-service calls, both whether the caller's VPC is associated with the service network (reachability) and whether the caller's IAM principal is allowed by an auth policy (identity), using SigV4-signed requests.
  • AWS Verified Access evaluates, for human access to an internal application, both the network path it brokers (VPN-less, no broad routing) and a Cedar policy against trust context from identity and device providers (identity and posture).

Around that intersection sit the supporting layers: account and VPC segmentation (Organizations, OUs, SCPs, VPCs, subnets, security groups), inspection (AWS Network Firewall for east-west and egress traffic), and detection (Amazon GuardDuty and AWS Security Hub). The rest of the article builds these up, then tears one request through them.

Selection questions — "Network Firewall or third-party appliance?", "Transit Gateway or Cloud WAN?", "PrivateLink or Lattice?" — are answered in the existing decision guides linked above. Here, the choices are made so we can show the implementation.

2. The Reference Architecture at a Glance

The architecture this guide implements is a multi-account zero-trust network: a small set of purpose-built AWS accounts under AWS Organizations, a central inspection path, identity-aware connectivity for services and humans, and an organization-wide detection plane.
Zero-trust network reference architecture on AWS
Zero-trust network reference architecture on AWS
Each component has one job:
  • Management account (AWS Organizations). Holds Service Control Policies (SCPs) that set non-negotiable guardrails for every member account — the outermost authorization layer. No workloads run here.
  • Identity / shared-services account. Hosts AWS IAM Identity Center (workforce identities and permission sets), the AWS Verified Access instance and its trust providers, and the Amazon VPC Lattice service network that is shared to workload accounts through AWS Resource Access Manager (RAM).
  • Network / inspection account. Owns the central inspection VPC running AWS Network Firewall, the AWS Transit Gateway that stitches VPCs together, and the egress path (NAT gateways, internet gateway). All cross-VPC and internet-bound traffic is routed through here for inspection.
  • Security account. The delegated administrator for Amazon GuardDuty and AWS Security Hub across the organization — the detection and posture plane.
  • Workload accounts. Per-environment or per-application accounts, each with a segmented VPC (separate subnets for web, application, and data tiers; security groups acting as micro-segmentation; NACLs as coarse subnet guards). Workloads expose and consume services through the Lattice service network and are reachable by humans only through Verified Access.

The flows that matter:
  1. Human-to-application access goes through Verified Access (no VPN, no broad network route), which evaluates a Cedar policy against identity and device trust context before brokering the connection to the application endpoint.
  2. Service-to-service (east-west) calls go through the VPC Lattice service network, which checks VPC association (reachability) and an IAM auth policy (identity) with SigV4.
  3. East-west VPC-to-VPC and egress (internet-bound) traffic is routed via Transit Gateway through the inspection VPC, where Network Firewall performs stateful inspection and egress filtering.
  4. Everything is observed by GuardDuty (threat detection) and Security Hub (posture and findings aggregation).

The account separation is itself a control: it creates hard IAM and billing boundaries, lets SCPs express organization-wide guardrails, and keeps the detection plane out of reach of workload-account compromise. The multi-account mechanics are covered in AWS Multi-Account Operational Patterns and identity onboarding in the AWS IAM Identity Center Setup Guide; this guide focuses on how the network and identity layers interlock.

It helps to see the control layers on one axis before diving into each. The table below lists every layer a request may pass through, what it checks, and where in the article it is implemented.
* You can sort the table by clicking on the column name.
LayerWhat it checksGrants or boundsWhere
Service Control Policy (SCP)Maximum allowed actions for the accountBounds (never grants)Sections 3.1, 7
VPC / subnet routing and associationWhether a network path exists at allGrants reachabilitySections 3, 5
Security groupInstance/ENI-level allowed flows (stateful)Grants reachabilitySection 3.2
Network ACLSubnet-level allow/deny (stateless)Bounds reachabilitySection 3.2
AWS Network FirewallStateful inspection, egress allowlist, optional TLS DPIBounds (inspect/drop)Section 4
Identity-based IAM policyWhat the calling principal may doGrants identitySections 5, 7
VPC Lattice auth policyWhich principal may invoke a service (resource-based)Grants identitySections 5, 7
Verified Access Cedar policyUser and device trust context for app accessGrants identitySections 6, 7
GuardDuty / Security HubAnomalies, threats, posture (after the fact)DetectsSection 8

A request must be allowed by every gate that applies to it, and any gate can deny. The remaining sections implement each row.

3. Segmentation and Account Strategy

Segmentation is the foundation. Before any inspection or identity check matters, the network has to be carved into zones small enough that a control between zones is meaningful. AWS gives you several nested segmentation tools, and zero trust uses all of them at different granularities.

3.1 Organizations and SCPs as the Outer Guardrail

The coarsest boundary is the AWS account, organized into Organizational Units (OUs). Service Control Policies attached to OUs define the maximum set of actions any principal in those accounts can perform — they never grant access, only bound it. In a zero-trust landing zone, SCPs typically deny things that should never happen anywhere: disabling GuardDuty or Security Hub, leaving Regions you do not operate in, removing VPC Flow Logs, or detaching the inspection routing. Because an SCP is evaluated before account-level IAM, it is the first gate in the layered authorization chain covered in Section 7. The full evaluation order — SCP, then identity policy, then resource policy, then session policy — is detailed in IAM Policy Evaluation Logic Step by Step; the point here is architectural: SCPs are how the organization expresses "this is never allowed, regardless of what a workload-account admin configures."

3.2 VPC Segmentation and Micro-Segmentation

Inside an account, the VPC is divided into subnets per tier (for example, web, application, and data), and routing between tiers is constrained. The finest-grained control is the security group, and in a zero-trust design security groups are used as micro-segmentation: an application-tier security group allows inbound only from the web-tier security group (by referencing the security group ID, not a CIDR), and the data-tier security group allows inbound only from the application tier. Referencing security group IDs rather than IP ranges means the policy follows the workload as instances and tasks scale, and it expresses intent ("the app tier may talk to the database") rather than topology.

Network ACLs add a coarse, stateless guard at the subnet boundary — useful as a backstop (for example, denying a CIDR outright) but too blunt for fine policy. AWS Prescriptive Guidance lists security groups and PrivateLink among the AWS-specific mechanisms for micro-segmentation; the principle is to create workload boundaries and authorize specific flows between them rather than allowing broad lateral reachability.

A concrete micro-segmentation matrix for a three-tier workload makes the intent legible and reviewable:
Tier (security group)Allowed inbound sourceAllowed portPurpose
Web (sg-web)Verified Access endpoint / ALB only443Receive brokered user traffic
App (sg-app)sg-web (by group ID)8080Serve the web tier only
Data (sg-data)sg-app (by group ID)5432Accept connections from the app tier only

Because each rule references the upstream security group by ID rather than a CIDR, the policy is independent of how many instances or tasks each tier runs and survives auto scaling unchanged. A reviewer can read the matrix and confirm that the data tier is unreachable from the web tier directly, which is precisely the lateral-movement path zero trust is built to remove. Expressed as a CLI rule, the data-tier ingress is simply:
aws ec2 authorize-security-group-ingress \
  --group-id sg-data \
  --protocol tcp --port 5432 \
  --source-group sg-app

3.3 Central Inspection and Transit Gateway Routing

Segmentation determines where traffic may go; inspection determines what is allowed to pass once it gets there. To inspect east-west and egress traffic in one place, the architecture routes inter-VPC and internet-bound traffic through a central inspection VPC using AWS Transit Gateway. Workload VPCs attach to the Transit Gateway; the Transit Gateway route tables send traffic destined for other VPCs or the internet to the inspection VPC, where AWS Network Firewall sits in the path, and only then onward.

The inspection VPC has a deliberate subnet layout. Each Availability Zone in the inspection VPC carries a dedicated firewall subnet (holding the Network Firewall endpoint for that AZ), a Transit Gateway attachment subnet, and — for egress — a public subnet with a NAT gateway routing to an internet gateway. Workload traffic arrives from the Transit Gateway, is routed to the firewall endpoint in the same AZ, and only after inspection continues either back to the Transit Gateway (for east-west to another VPC) or out through NAT to the internet (for egress). The route tables encode the hairpin: the Transit Gateway subnet route table sends traffic to the firewall endpoint, and the firewall subnet route table sends inspected traffic onward. Getting these route tables and the AZ alignment right is what makes inspection both complete (no path bypasses the firewall) and symmetric.

One detail is decisive for stateful inspection: symmetric routing. A stateful firewall must see both directions of a flow. With Transit Gateway, you enable appliance mode on the inspection VPC attachment so that Transit Gateway keeps both directions of a flow pinned to the same Availability Zone and the same firewall endpoint. Without appliance mode, return traffic can take a different AZ path, the stateful engine never sees the matching half of the flow, and connections fail intermittently — one of the most common and most confusing zero-trust networking failures (see Section 9). The decision between a centralized inspection VPC and per-VPC distributed firewalls, and between Transit Gateway and AWS Cloud WAN, belongs to the AWS VPC Connectivity Decision Guide; here we adopt the centralized model because it concentrates inspection and egress in one auditable account.

4. East-West and Egress Inspection with AWS Network Firewall

AWS Network Firewall is a managed, stateful network firewall and intrusion prevention service for Amazon VPC. It is the inspection muscle of the architecture: it filters and inspects traffic at the VPC perimeter, including traffic to and from internet gateways, NAT gateways, VPN, and AWS Direct Connect. This section goes a layer deeper than a service overview, because the rule model and deployment model both shape the architecture.
East-west and egress inspection traffic path with AWS Network Firewall
East-west and egress inspection traffic path with AWS Network Firewall

4.1 The Two Engines: Stateless and Stateful

Network Firewall evaluates traffic with two engines. The stateless engine inspects packets individually against 5-tuple criteria (a fast first pass that can pass, drop, or forward to the stateful engine). The stateful engine performs connection-aware inspection and, depending on the rules, deep packet inspection (DPI) of the payload, using Suricata-compatible rules. A practical best practice from the documentation: configure the stateless default action to forward to the stateful rule groups in both directions, so that stateful rules using flow state (such as flow: established) evaluate correctly for bidirectional traffic.

4.2 Stateful Rule Groups and Suricata Compatibility

Stateful rule groups use Suricata-compatible intrusion prevention (IPS) specifications. You can write raw Suricata-compatible rule strings, or provide simpler specifications (a standard "5-tuple" rule group or a domain-list rule group) that Network Firewall translates into Suricata rules for you. Two properties matter at the architecture level:
  • Rule ordering is fixed at creation. Each stateful rule group has a RuleOrder setting (default action order or strict order) inside StatefulRuleOptions, and you cannot change it after the rule group is created. Choosing strict order up front gives you predictable, sequential evaluation — important when you want an explicit "allow listed traffic, then drop everything else" posture.
  • Suricata version. Network Firewall upgraded its stateful engine from Suricata 6.0.9 to 7.0 in November 2024. That upgrade changed rule syntax in ways that affect existing rules — for example, PCRE1 was replaced by PCRE2, a sticky buffer keyword (such as tls.sni or dns.query) must be immediately followed by its content modifier, and range keywords like itype now require a min:max format. If you are porting older Suricata rules, validate them against 7.0 syntax.

Supported rule actions are pass, drop, reject, and alert. There are also documented limitations to design around: Network Firewall does not support Suricata features such as Lua scripting, datasets, IP reputation, file extraction, and thresholding, and it does not inspect certain protocols (for example SCTP, IKEv2, and IP-in-IP). Knowing these boundaries up front prevents writing rules that silently never match.

A raw Suricata-compatible rule that alerts on outbound TLS to an unexpected SNI, for east-west or egress paths, looks like this — note the sticky-buffer-then-content ordering that Suricata 7.0 requires:
alert tls $HOME_NET any -> $EXTERNAL_NET any (tls.sni; content:"known-bad.example"; msg:"Outbound TLS to flagged SNI"; sid:1000001; rev:1;)
In a zero-trust posture you generally prefer an allowlist expressed as a domain-list rule group (Section 4.3) with a default drop, and reserve hand-written alert/drop signatures for specific known-bad patterns, so that the "deny by default, allow what is named" stance is the structure rather than an afterthought.

4.3 Egress Filtering with Domain Lists

A core zero-trust behavior is controlling egress: workloads should reach only an explicit allowlist of destinations, so that a compromised host cannot exfiltrate to or call back an arbitrary internet endpoint. Network Firewall implements this cleanly with domain-list stateful rule groups — an allowlist of fully qualified domain names (FQDNs) that outbound HTTP and TLS traffic may reach. For HTTP the host header is matched; for TLS the Server Name Indication (SNI) is matched. A minimal egress rule group:
{
  "RulesSource": {
    "RulesSourceList": {
      "TargetTypes": ["HTTP_HOST", "TLS_SNI"],
      "Targets": [
        ".amazonaws.com",
        ".internal.example.com"
      ],
      "GeneratedRulesType": "ALLOWLIST"
    }
  }
}
Pair an ALLOWLIST domain rule group with a stateful default drop so that anything not on the list is denied. This is the network-layer complement to least-privilege identity: even a principal with valid credentials, running on a compromised host, cannot reach destinations the egress policy does not permit.

4.4 TLS Inspection

Domain allowlisting on SNI sees the requested name but not the encrypted payload. When you need to inspect the contents of TLS-encrypted flows (for example, to run IPS signatures against decrypted traffic), Network Firewall supports a TLS inspection configuration. It decrypts inbound and/or outbound SSL/TLS traffic using certificates managed in AWS Certificate Manager (ACM), inspects the decrypted traffic against the firewall policy's stateful rules, and re-encrypts before forwarding. You define a scope (protocol, source and destination address ranges, and port ranges) that selects which traffic to decrypt; Network Firewall automatically creates mirrored rules so both directions of a scoped flow are decrypted. Optional settings include using a customer-managed AWS KMS key instead of the default AWS-owned key, and enabling certificate-revocation status checking for outbound server certificates.

TLS inspection is powerful but operationally heavy (certificate trust, performance, privacy and compliance considerations), so most teams apply it to a scoped subset of flows rather than universally. The architectural point is that the capability exists and is scoped, not all-or-nothing.

4.5 Deployment Model and Logging

In this architecture Network Firewall is deployed in the centralized model: one firewall in the inspection VPC, with Transit Gateway routing all inter-VPC and egress traffic through it (the distributed model — a firewall per VPC — is the alternative when you want VPC-local inspection without a central hub). Network Firewall produces alert, flow, and TLS logs that you can send to Amazon CloudWatch Logs, Amazon S3, or Amazon Data Firehose; these logs are the primary evidence when diagnosing "why was this connection dropped" (Section 9). A representative firewall policy reference, with a stateful default drop and the egress allowlist attached, is the kind of artifact you keep in version control:
aws network-firewall create-firewall-policy \
  --firewall-policy-name zt-egress-policy \
  --firewall-policy '{
    "StatelessDefaultActions": ["aws:forward_to_sfe"],
    "StatelessFragmentDefaultActions": ["aws:forward_to_sfe"],
    "StatefulDefaultActions": ["aws:drop_strict", "aws:alert_strict"],
    "StatefulEngineOptions": {"RuleOrder": "STRICT_ORDER"},
    "StatefulRuleGroupReferences": [
      {"ResourceArn": "arn:aws:network-firewall:REGION:ACCOUNT:stateful-rulegroup/zt-egress-allowlist"}
    ]
  }'

5. Identity-Aware Service Access with Amazon VPC Lattice

Network Firewall controls what traffic may pass; it does not know who is calling. For service-to-service (east-west) calls, the architecture adds identity to reachability using Amazon VPC Lattice. Lattice is an application-layer networking service that connects, secures, and monitors service-to-service communication across VPCs and accounts without the consumer and provider networks needing to be routable to each other. This guide uses it specifically for the intersection property; the full mechanics (service networks, target groups, listeners, health, DNS) are in the AWS VPC Lattice Complete Guide.

5.1 Service Networks and Two-Dimensional Access

A Lattice service network is a logical boundary that groups services and the VPCs allowed to reach them. Access has two independent dimensions:
  • Reachability (network): a VPC must be associated with the service network to send traffic to its services. If your VPC is not associated, you cannot reach the service at all — association is the network-layer gate.
  • Identity (auth): even with association, the caller's request is authorized by an auth policy.

This two-dimensional model is exactly "reachability is not permission" made concrete: association grants a path; the auth policy decides whether a specific principal may use it.

5.2 Auth Policies and SigV4

A VPC Lattice auth policy is an IAM resource-based policy (a JSON IAM policy document) attached to a service network or to an individual service. You enable it by setting the resource's auth type to AWS_IAM (the alternative, NONE, disables the auth policy and lets any client in an associated VPC reach the service). With AWS_IAM, callers sign requests with Signature Version 4 (SigV4), and Lattice authorizes the request.

Two rules from the documentation shape the design:
  • One auth policy per resource. You can attach at most one auth policy to a service network and one to each service. The service-network policy is the broad default; per-service policies add more restrictive control where needed.
  • Both sides must allow. Auth policies are different from IAM identity-based policies. For authorization to succeed, both the auth policy (resource-based) and the caller's identity-based policy must contain explicit Allow statements. This is the same "union of allows, any explicit deny wins" model as the rest of IAM, and it is why service-to-service authorization in Lattice is genuinely zero-trust: the calling principal's own policy must permit the call, and the target's auth policy must permit the principal.

Enabling IAM auth and attaching a policy is a two-step CLI sequence:
# 1) Enable IAM auth on the service network
aws vpc-lattice create-service-network \
  --name zt-service-network \
  --auth-type AWS_IAM

# 2) Attach an auth policy (resource-based) to the service network
aws vpc-lattice put-auth-policy \
  --resource-identifier sn-0123456789abcdef0 \
  --policy file://service-network-auth-policy.json
A least-privilege auth policy names the allowed principal explicitly and can further constrain the request using VPC Lattice condition keys (for example, restricting by the source VPC, the authenticated principal, or request attributes). Lattice supports IAM policy actions under the vpc-lattice: prefix, policy resources by ARN, condition keys, and ABAC via tags, so you can write policies such as "only the orders-service role, calling from the production VPC, may invoke the payments service":
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": {"AWS": "arn:aws:iam::ACCOUNT:role/orders-service"},
      "Action": "vpc-lattice-svcs:Invoke",
      "Resource": "*",
      "Condition": {
        "StringEquals": {"vpc-lattice-svcs:SourceVpc": "vpc-0prod1234"}
      }
    }
  ]
}
If you do not attach an auth policy after enabling AWS_IAM, all traffic receives an access-denied error — a fail-closed default that matches zero-trust intent. For purely one-directional private exposure of a single service or resource where you do not need application-layer identity (for example, exposing a database privately by endpoint), AWS PrivateLink is the lighter primitive; that trade-off is covered in the AWS PrivateLink and VPC Endpoints Complete Guide.

5.3 Two Policy Scopes and Cross-Account Sharing

Auth policies come at two scopes, and using both is the idiomatic least-privilege pattern. The service-network auth policy is the broad guardrail for every service in the network — for example, "only principals in this organization, calling from associated VPCs." A per-service auth policy then adds the tighter, service-specific rule — "only the orders-service role may invoke payments." Both are evaluated, so the effective permission is the intersection of the service-network policy, the service policy, and the caller's identity-based policy: a request must be allowed by all of them and denied by none.

VPC Lattice integrates with IAM beyond the auth policy itself. It supports identity-based policies (with the vpc-lattice: action prefix for control-plane actions such as vpc-lattice:PutAuthPolicy), resource-based auth policies (the vpc-lattice-svcs: data-plane actions used in the examples above), policy condition keys, ABAC via tags, and temporary credentials. Condition keys let an auth policy constrain by request context — the source VPC, the authenticated principal, the request method or path — so you can write rules far more specific than "this role may call this service."

Because the service network is typically owned by the shared-services account but consumed by workload accounts, you share it across accounts with AWS Resource Access Manager (RAM). The service-network owner shares the network (and the right to associate VPCs and services) to the workload accounts or to the whole organization; the auth policy still governs who may actually invoke each service. This separation — one team owns and shares the connectivity fabric, each service owner controls its own auth policy — is what lets a large organization run identity-aware east-west connectivity without a central bottleneck.

6. Identity-Aware Application Access with AWS Verified Access

Lattice secures service-to-service calls. For human access to internal applications, the architecture replaces the VPN with AWS Verified Access. Verified Access provides secure, VPN-less access to corporate applications: each request is evaluated against a policy using identity and device trust context before access is granted, so users connect to a specific application rather than to the network.
One access request evaluated across the zero-trust authorization layers
One access request evaluated across the zero-trust authorization layers

6.1 Trust Providers: Identity and Device

A Verified Access trust provider sends information about users and devices — the trust context — to Verified Access. There are two categories:
  • User-identity trust providers. An OpenID Connect (OIDC)-compatible identity provider that manages user identities. AWS IAM Identity Center or a third-party IdP can serve this role, supplying attributes such as email or group membership.
  • Device-management (device-based) trust providers. Systems that manage endpoints and report device posture. Verified Access supports CrowdStrike, Jamf, and JumpCloud as device trust providers, supplying signals such as whether a device is managed and compliant. Device-based providers require adding a redirect URI built from the Verified Access endpoint's DeviceValidationDomain to the provider's allowlist.

You can combine a user-identity provider with a device-management provider so that a policy considers both who the user is and whether their device is healthy — the converged context that AWS's zero-trust authorization principle calls for. The two categories supply different signals:
Trust provider categoryExamplesSignals suppliedTypical policy use
User identity (OIDC)IAM Identity Center, third-party OIDC IdPEmail, group membership, scopes"Is this the right user, in the right group?"
Device managementCrowdStrike, Jamf, JumpCloudDevice risk score, managed/posture status"Is this a healthy, managed device?"

The request flow ties them together. When a user opens an internal application through Verified Access, Verified Access redirects the browser to the configured OIDC identity provider to authenticate; the IdP returns identity claims, and (if a device provider is configured) the device's posture is supplied as part of the trust context. Verified Access evaluates the Cedar policy against that combined context for the request, and only on an explicit permit does it broker the connection to the endpoint's target (the ALB, network interface, RDS instance, or CIDR). The user never receives a network route into the VPC — they reach exactly one application, mediated per request.

6.2 Groups, Endpoints, and Cedar Policies

Verified Access is organized as an instance (which the trust providers attach to), groups (which hold a shared access policy), and endpoints (each representing one application). An endpoint can target several application types: a load balancer (Application Load Balancer or Network Load Balancer), a network interface, an RDS database, or a CIDR range — so Verified Access now fronts both HTTP applications and non-HTTP resources, not only web apps.

Access is governed by Verified Access policies written in Cedar, AWS's policy language. Cedar policies are evaluated against the trust data from the configured providers. A few properties make this a strong zero-trust enforcement point:
  • Fail-closed by default. When you create a group or endpoint, defining the policy is optional — but until a policy exists, all access requests are blocked. Access requires an explicit policy that permits it.
  • Per-request, context-aware. Each request is evaluated against the policy with fresh identity and device context, not just at session start.

A Cedar policy can express requirements such as "permit access only for members of a specific IAM Identity Center group, with a verified email, on a low-risk device (a Jamf risk score of LOW)":
permit(principal, action, resource)
when {
    context.idc.groups has "c242c5b0-6081-1845-6fa8-6e0d9513c107" &&
    context.idc.user.email.verified == true &&
    context.jamf.risk == "LOW"
};
Because users reach a specific application endpoint through Verified Access rather than gaining a network route, a credential that is valid but used from an unmanaged device, or by a user outside the permitted group, is denied at the application boundary even though the network path Verified Access brokers exists. Verified Access also logs each access decision, which (with the other layer logs) makes the "which layer denied this" question in Section 9 answerable.

7. Layered Authorization: SCP, IAM, Resource, Lattice, and Verified Access Policies

The architecture's defining property is that one access request is evaluated by several independent layers, and every layer must allow it while any layer can deny it. This section assembles the layers into a single ordered picture for one request; the detailed IAM evaluation algorithm (how Deny overrides Allow, how SCPs, identity policies, resource policies, permission boundaries, and session policies combine) is documented in IAM Policy Evaluation Logic Step by Step and is not re-derived here.

For a service-to-service call through Lattice, the layers in play are, in order of where they gate the request:
  1. Organization guardrail (SCP). Bounds what any principal in the account may do at all. If an SCP denies the action, nothing downstream can re-grant it.
  2. Network reachability. The caller's VPC must be associated with the Lattice service network; the path through Transit Gateway and the inspection VPC must be permitted by route tables, security groups, NACLs, and Network Firewall rules.
  3. Caller identity policy (identity-based IAM). The calling principal's own IAM policy must Allow the vpc-lattice-svcs action.
  4. Resource auth policy (resource-based IAM). The Lattice auth policy on the service network or service must Allow that principal.
  5. Result. Only if SCP does not deny, the network path exists, and both the identity-based and resource-based policies allow, does the request succeed.

For human access through Verified Access, the chain is analogous but the final gate is the Cedar policy evaluated against identity and device trust context, in addition to any IAM that governs the underlying application's own AWS calls.

This is the concrete meaning of "reachability is not permission." A principal can have a network path (association, routing, open security group) and still be denied because an identity-based policy, a resource auth policy, an SCP, or a Cedar policy says no. Conversely, a principal can be named in an auth policy and still be unreachable because no network association exists. Designing for least privilege means keeping each of these layers as tight as its purpose requires, and avoiding the anti-patterns — wildcard principals, NONE auth types left in place, overly broad security groups — catalogued in IAM Anti-Patterns.

A useful discipline is to write down, for each sensitive flow, the table of layers and what each one checks. It turns an abstract "zero trust" goal into a verifiable checklist and makes the failure-mode analysis in Section 9 mechanical rather than guesswork. For the concrete orders-service to payments call, the worked table reads:
GateConfigured to checkPass condition
SCPOrg guardrail (e.g. Region, no disabling of detection)Action not denied at org level
VPC associationorders VPC associated with the Lattice service networkAssociation exists
Network path / SGRoute and security groups permit the callPath reachable
Identity-based IAMorders-service role may call vpc-lattice-svcs:InvokeExplicit Allow
Lattice auth policypayments service allows orders-service from the prod VPCExplicit Allow, no Deny

The request succeeds only when every row passes; any single failing row denies it, and reading the row's own log (Section 9) tells you which one acted.

8. Detection and Response

Prevention layers fail closed, but they are not self-aware; you still need to see what is happening and respond. Detection is the continuous-monitoring principle of zero trust, and in this architecture it is provided by Amazon GuardDuty and AWS Security Hub, operated from the security account as the organization's delegated administrator. Deep treatment of multi-account log aggregation, audit, and query belongs to the centralized logging and audit architecture (No.12 in this series); here we place detection in the zero-trust picture.

8.1 Amazon GuardDuty

Amazon GuardDuty is a threat-detection service that continuously analyzes AWS data sources using threat intelligence and machine learning. Its foundational data sources — AWS CloudTrail management events, VPC Flow Logs, and Route 53 Resolver DNS query logs — are analyzed automatically when GuardDuty is enabled, with no additional configuration. From these it detects activity such as communication with known-malicious domains or IPs, anomalous API usage, credential compromise, and reconnaissance. GuardDuty also offers Extended Threat Detection for correlating multi-stage attack sequences, and optional protection plans focused on specific resources (for example S3, EKS, RDS, Lambda, and runtime monitoring). In a zero-trust network, GuardDuty is what notices that a workload — even one inside its segment with valid credentials — has started behaving like a compromised host, which is precisely the case the preventive layers are designed to contain but cannot by themselves reveal.

8.2 AWS Security Hub

AWS Security Hub aggregates security findings across accounts and Regions and runs automated security checks against best-practice standards, giving you a consolidated view of posture and a prioritized list of issues. GuardDuty findings flow into Security Hub alongside checks from other services, and findings can be routed through Amazon EventBridge to trigger automated response (notification, ticketing, or remediation) — the automation-and-orchestration principle of zero trust. Operating both services from a delegated administrator account keeps the detection plane independent of the workload accounts it watches, so a workload-account compromise does not also blind or disable detection. The deeper mechanics of multi-account log aggregation, audit trails, and querying are the subject of the Centralized Logging and Audit Architecture on AWS guide.

9. Failure Modes and Diagnostics

Layered architectures move the hard problem from "can a packet get through" to "which of several layers stopped it." The most common operational question in a zero-trust network is some form of "I can't reach the service — why?" Because reachability and identity are checked independently, the answer is almost always "one specific layer denied it," and the skill is isolating which one quickly. Work the layers in order, using each layer's own logs.

9.1 "Unreachable" — Which Layer Denied It?

A systematic isolation, outermost to innermost:
  • Routing and association. Does the path even exist? For Lattice, confirm the caller's VPC is associated with the service network. For inter-VPC and egress, check Transit Gateway route tables and that the inspection VPC route exists. VPC Flow Logs on the source and destination ENIs show whether packets are leaving and arriving.
  • Stateless network controls. Security group rules (remember they are stateful and reference other security groups) and NACLs (stateless, evaluated in rule-number order). A missing return-path NACL rule is a classic asymmetric failure.
  • Firewall inspection. Check Network Firewall alert and flow logs. A drop against your flow tells you a stateful rule (or the default drop) matched. An egress failure to an external domain is usually the domain not being on the ALLOWLIST.
  • Identity and authorization. For Lattice, an access-denied despite a valid path means either the identity-based policy or the auth policy is missing an Allow, or an explicit Deny matched; VPC Lattice access logs record the authenticated request. For Verified Access, the access logs show the policy decision; a deny with a valid user usually means the Cedar policy condition (group, device posture) was not satisfied, or no policy is attached (fail-closed).

The discipline of "check the layer's own log, in order" turns a vague outage into a single identified gate. As a quick reference, common symptoms map to layers and logs:
SymptomLikely layerLog / signal to checkTypical fix
No packets arrive at allRouting / associationVPC Flow Logs; TGW route tables; Lattice associationAdd association / route
Connection resets only across AZsAsymmetric routingFirewall flow logs (half-flows)Enable TGW appliance mode
Egress to a domain fails, others workNetwork Firewall egressFirewall alert logs (drop)Add FQDN to allowlist
Access-denied with a valid path (service)Lattice auth / IAMLattice access logsFix auth policy or identity policy Allow
User denied on a healthy login (app)Verified Access policyVerified Access access logsFix Cedar condition or attach policy
Intermittent timeouts under loadInspection capacityFirewall CloudWatch metricsScale / multi-AZ inspection

9.2 Asymmetric Routing and the Stateful Engine

The subtlest failure is asymmetric routing breaking stateful inspection. A stateful firewall must see both directions of a flow; if the forward path goes through one firewall endpoint (or AZ) and the return path through another, the engine sees half a conversation and drops it, producing intermittent, hard-to-reproduce failures. The fix in a Transit Gateway design is appliance mode on the inspection VPC attachment, which pins both directions of a flow to the same AZ and endpoint. If connections succeed within an AZ but fail across AZs, or fail only for long-lived flows, suspect symmetry first.

9.3 Inspection Bottlenecks and Capacity

Because all inter-VPC and egress traffic funnels through the inspection VPC, the firewall is on the critical path for throughput. Network Firewall scales, but a centralized design concentrates load, so capacity planning and per-firewall throughput characteristics matter; watch the firewall's CloudWatch metrics for dropped or passed packet counts and latency, and design the inspection VPC across multiple Availability Zones. This is the trade-off of centralization: one auditable choke point for inspection and egress, at the cost of making that point a performance-sensitive component. (Cost characteristics of inspection and data processing are real but out of scope here; see the official pricing pages.)

9.4 Policy Misconfiguration

Finally, many "denied" cases are simply policy errors: an explicit Deny that wins over an intended Allow, an auth type left at NONE (so a service you thought was protected is open to any associated VPC), a wildcard that grants more than intended, or a Cedar condition that never evaluates true. Because each layer fails closed, a missing Allow denies; because IAM Deny is absolute, a stray deny denies everywhere. The remedy is the per-flow layer table from Section 7: verify each gate's policy against what it is supposed to check, and read the matching log to confirm which gate acted.

10. Variations: How Far to Take Zero Trust

Zero trust is a direction of travel, not a binary state, and AWS guidance is explicit that it should be implemented incrementally, starting with high-value use cases. The architecture in this guide is the full picture; most organizations adopt it in stages, and choosing the stage is itself the variation.

A pragmatic progression:
  1. Front the highest-value internal application with Verified Access first. Replacing a VPN for one sensitive app delivers identity-and-device-aware access with no broad network route, and is self-contained.
  2. Add egress filtering with Network Firewall domain allowlists, which limits exfiltration and call-back paths organization-wide with relatively low application impact.
  3. Introduce VPC Lattice auth policies for service-to-service calls on the services that most need caller identity, expanding from NONE to AWS_IAM service by service.
  4. Tighten micro-segmentation (security-group-to-security-group rules) and add TLS inspection for scoped flows once the operational muscle exists.
  5. Make detection organization-wide with GuardDuty and Security Hub from a delegated administrator, if not already in place.

The trade-off at every step is operational complexity against blast-radius reduction. More layers mean more places to get a policy right and more logs to correlate; the layered logging discipline from Section 9 is what keeps that complexity manageable. The "how far" decision is also where you lean on the existing decision guides rather than re-deciding here: the connectivity model, the messaging and identity choices, and the per-service depth are all delegated. This guide's contribution is the shape of the assembled system and the flow of one request through it.

11. Frequently Asked Questions

Does being able to reach a service mean I'm authorized to use it?
No — that is the entire premise. Reachability (a route, a VPC association, an open security group) only creates a path. Authorization is decided separately: by IAM identity-based policies, by resource-based policies such as a VPC Lattice auth policy, by SCP guardrails, and for human access by a Verified Access Cedar policy. A request must pass reachability and identity authorization; either can independently deny it.

Where should I start with zero trust on AWS?
Incrementally, with a high-value use case. A common first step is fronting one sensitive internal application with AWS Verified Access (removing a VPN and adding identity-and-device-aware access), then adding egress filtering with Network Firewall, then service-to-service auth with VPC Lattice. AWS Prescriptive Guidance recommends progressive enhancement rather than a big-bang rollout.

Network Firewall, security groups, or NACLs — which does what?
They operate at different granularities and are complementary. Security groups are stateful, instance/ENI-level micro-segmentation (reference other security groups, not CIDRs, to express intent). NACLs are stateless, coarse subnet-level guards. AWS Network Firewall is managed, stateful inspection and IPS at the VPC/perimeter level, with Suricata-compatible rules, domain-list egress filtering, and optional TLS inspection. Zero trust uses all three, each where it fits.

VPC Lattice auth policy or Verified Access policy — which do I use?
VPC Lattice auth policies authorize service-to-service (machine-to-machine) calls using IAM and SigV4; both the caller's identity-based policy and the resource auth policy must allow. AWS Verified Access policies authorize human-to-application access using Cedar, evaluated against identity and device trust context. Use Lattice for east-west service calls and Verified Access for workforce access to internal apps; they meet the same "identity plus reachability" goal for different request types.

How do I tell which layer denied a request?
Work outside-in using each layer's own logs: VPC Flow Logs for reachability, Network Firewall alert/flow logs for inspection drops, VPC Lattice access logs for service authorization, and Verified Access access logs for application-access decisions. A per-flow table of "which gate checks what" (Section 7) makes the isolation mechanical. The most common subtle case is asymmetric routing breaking stateful inspection, fixed with Transit Gateway appliance mode.

Is identity enough on its own, or do I still need network controls?
You still need both. AWS's position is that identity-centric and network-centric controls are complementary, applied where each adds the most value. Network controls shrink blast radius, give you a place to inspect for threats, and contain lateral movement; identity controls ensure that network access is never sufficient by itself. The strongest designs evaluate both for the same request.

Is the architecture "secure" once it is configured?
No configuration makes a system safe by itself. Zero trust is layered specifically because any one layer can be misconfigured; the value is that several independent layers must each authorize a request, and that detection (GuardDuty, Security Hub) watches for the cases the preventive layers cannot reveal on their own. Treat the layers as defense in depth, verify each one, and monitor continuously.

12. Summary

A zero-trust network on AWS is not a product you enable; it is an architecture you assemble. This guide built one named reference architecture — a multi-account zero-trust network — from five interlocking parts: segmentation (Organizations, OUs, SCPs, VPCs, subnets, and security groups as micro-segmentation), inspection (AWS Network Firewall for east-west and egress traffic, with Suricata-compatible rules, domain-list egress filtering, and scoped TLS inspection, routed centrally through Transit Gateway with appliance mode), identity-aware service access (Amazon VPC Lattice service networks with IAM auth policies and SigV4, where both reachability and identity are checked), identity-aware application access (AWS Verified Access with user and device trust providers and Cedar policies, fail-closed and VPN-less), and detection (Amazon GuardDuty and AWS Security Hub from a delegated administrator). The thread tying them together is one sentence — network reachability is not permission — made concrete by following a single request through every layer that must independently authorize it, and by knowing which layer's log to read when one of them says no.

From here, the natural next steps in this series are the Secure Web Application Reference Architecture on AWS (the edge-to-data capstone that applies these layers to a public application) and the Centralized Logging and Audit Architecture on AWS (the detection-and-audit pair that deepens the logging and query plane this guide only touched). For the per-service depth deliberately delegated throughout, return to the AWS VPC Lattice Complete Guide, the AWS PrivateLink and VPC Endpoints Complete Guide, the AWS VPC Connectivity Decision Guide, and IAM Policy Evaluation Logic Step by Step.

13. References

Related Articles


References:
Tech Blog with curated related content

Written by Hidekazu Konishi