Production readiness is more than a successful deployment.
Cloud platforms make it easy to ship quickly, but defaults and delivery pressure can leave unnecessary exposure in identity, networking, data storage, and monitoring.
Review identity paths before individual permissions
Confirm that human and workload identities use least privilege, privileged access is controlled, and long-lived credentials are avoided wherever possible. Review role trust relationships, service principals, federated identities, cross-account access, and the paths by which a lower-privileged identity could gain broader authority.
A permission can look harmless in isolation but become dangerous when combined with the ability to pass a role, modify a workload, read a secret, or change an automation pipeline.
Understand external and internal exposure
Map public endpoints, security groups, load balancers, storage permissions, administrative interfaces, and outbound network paths. Every exposed service should have a clear business reason, an accountable owner, and an expected authentication control.
Internal reachability matters too. If an application is compromised, determine whether it can access instance metadata, management APIs, databases, or services that assume the network is trusted.
Validate secrets, data, and encryption boundaries
Review how secrets are created, distributed, rotated, and accessed by workloads. Check storage permissions, key policies, snapshot exposure, and whether sensitive data can move into lower-control environments through logs, exports, or backups.
Validate logging and detection with scenarios
Ensure critical authentication, control-plane, network, and application events are retained and monitored. Then test whether the team could answer practical questions: who changed a role, which workload used a credential, what data was accessed, and whether the activity crossed accounts or regions.
Plan for failure
Backups, key management, recovery procedures, deployment rollback, and emergency access should be tested before they are needed. Resilience is a security control only when recovery data is isolated, restore procedures are exercised, and the recovered environment is verified before normal operations resume.
