Back to Portfolio

Technical Blog

Architecting for the Real World: AWS Deep Dives, Production Scaling Patterns, and the AI-Driven Cloud.

The AWS Platform Engineering Roadmap

24 posts · 8 phases
AWS Networking Series

Seven-part series covering VPC fundamentals through enterprise-grade security architecture.

AWS VPC Basics
Demystifying AWS VPC: From Layman to Cloud Architect

Master the foundations of AWS networking. Learn about subnets, route tables, IGWs, and the famous 'Rule of 5' reserved IPs.

April 29, 2026 12 min read
Advanced VPC Concepts
Advanced VPC Concepts: Scaling & High Performance

Dive deep into Transit Gateway, PrivateLink, Global Accelerator, and cross-region peering for enterprise-scale workloads.

April 30, 2026 12 min read
NAT Gateway vs PrivateLink Architecture
NAT Gateway vs PrivateLink vs VPC Endpoints

Cost & Architecture Trade-offs. A comprehensive comparison of secure internet and service connectivity patterns in AWS.

April 30, 2026 12 min read
TGW vs VPC Peering
Transit Gateway vs VPC Peering — When to Use What

Mesh vs. Hub-and-Spoke. A deep dive into choosing the right connectivity strategy for enterprise-scale AWS environments.

May 1, 2026 25 min read
AWS DNS Architecture
How DNS Works in AWS: Route 53 & Hybrid Failover

Mastering Route 53 Private Hosted Zones, Resolver Endpoints, and cross-account DNS resolution.

May 3, 2026 20 min read
Direct Connect vs VPNs Architecture
Direct Connect vs Site-to-Site VPN vs Client VPN

Every enterprise AWS journey eventually reaches the hybrid connectivity question: how do your on-premises systems securely connect to AWS?

May 4, 2026 25 min read
AWS Network Firewall vs SG vs NACLs
AWS Network Firewall vs Security Groups vs NACLs

DevSecOps angle: Layered defense-in-depth, extending Gateway Load Balancer patterns for enterprise security.

May 6, 2026 22 min read
AWS Security Series

IAM, GuardDuty, WAF, and the defence-in-depth model for regulated AWS environments.

AWS IAM Deep Dive
AWS IAM Deep Dive — Roles, Policies, and Zero-Trust at Scale

Building secure, cost-optimised infrastructure. A masterclass on OIDC, Permissions Boundaries, and Zero-Trust identity.

May 7, 2026 25 min read
AWS Security in Depth
AWS Security in Depth: GuardDuty, Security Hub & WAF

Beyond prevention: A deep dive into continuous monitoring, automated threat response, and application-layer protection using native AWS security services.

May 8, 2026 25 min read
AWS Containers & Platform Engineering

ECS vs EKS, production cluster design, GitOps, autoscaling, and IaC at enterprise scale.

AWS ECS Fargate vs EKS Architecture
AWS ECS Fargate vs EKS — When I Used Both and How to Choose

Navigating the container landscape. Choosing the right abstraction level for your enterprise workloads based on scale, cost, and operational overhead.

May 15, 2026 22 min read
Production-Grade EKS Architecture — Multi-Env, Node Groups and Isolation
Production-Grade EKS Architecture: Multi-Env Setup, Node Groups & Isolation Strategy

The full EKS blueprint — three-tier node groups, namespace isolation with NetworkPolicies, RBAC, Helm values hierarchy, and zero-downtime upgrade patterns.

May 16, 2026 25 min read
EKS Ingress ALB NLB Traffic Routing Architecture
Ingress vs ALB vs NLB in EKS — Real Traffic Routing Patterns Explained

The Rabobank hybrid pattern — Nginx+NLB for 33 microservices, ALB for ArgoCD, NLB passthrough for Amazon MQ — with full Terraform, cost comparison, and 7 production anti-patterns.

May 17, 2026 22 min read
Karpenter vs Cluster Autoscaler EKS Architecture
Karpenter vs Cluster Autoscaler — How We Run Both in Production to Maximise EKS Cost Efficiency

Three-tier compute: Cluster Autoscaler for On-Demand baseline, Karpenter for Spot. Complete Terraform, Helm, conflict prevention, interruption handling, and the real cost levers from Rabobank.

May 18, 2026 30 min read
Terraform at Scale Enterprise IaC Structure
Terraform at Scale: Structuring IaC for Enterprise AWS Environments

Layered repos, S3 remote state with DynamoDB locking, opinionated modules, plan-on-PR CI/CD with approval gates, and daily drift detection from a production EKS platform.

May 19, 2026 28 min read
GitOps with ArgoCD Helm on EKS Architecture
GitOps with ArgoCD + Helm on EKS: App of Apps, Sync Waves & Multi-Cluster Strategy

App of Apps bootstrap, ApplicationSet generators, Sync Waves for ordered rollout, multi-cluster management, RBAC with SSO, Image Updater, and Notifications. Full working YAML from a production EKS platform.

May 20, 2026 32 min read
AWS Observability

CloudWatch, X-Ray, OpenTelemetry, and the full observability stack for production EKS platforms.

AWS Observability Stack CloudWatch X-Ray OpenTelemetry
AWS Observability Stack: CloudWatch, X-Ray, OpenTelemetry & What's Still Missing

The complete observability model for EKS — Container Insights, structured logging with Fluent Bit, distributed tracing with ADOT and X-Ray, Synthetics, and the honest gaps. With the incident that made this post necessary.

May 24, 2026 20 min read
FinOps & Cost Optimisation

Cloud cost ownership from a FinOps Certified Practitioner — tagging strategy, Spot compute, Savings Plans, networking optimisation, and the operating model that makes savings stick.

AWS FinOps Cost Optimisation Strategies
AWS FinOps in Practice: Cost Optimisation Strategies from a Certified Practitioner

Real-world cost optimisation across EKS, ECS, VPC, and S3 — Karpenter Spot, Fargate Spot, Savings Plans, Gateway Endpoints, gp3 migration, and the FinOps operating model that prevents cost regression.

May 26, 2026 20 min read
Advanced Architecture Patterns

Multi-region, high availability, and global-scale architecture patterns — the decisions that determine whether your platform survives its worst day.

Multi-Region High Availability on AWS
Multi-Region High Availability on AWS: Active-Active vs Active-Passive Design

Route 53 failover routing, Aurora Global Database, DynamoDB Global Tables, TGW peering, Global Accelerator, chaos engineering with FIS, and the decision framework for choosing between Active-Passive and Active-Active.

May 27, 2026 22 min read
Multi-Account AWS Strategy
Multi-Account AWS Strategy: Landing Zones, Control Tower & Org-Level Networking

AWS Organizations and OU design, Service Control Policies, Control Tower Landing Zones, Account Factory for Terraform, centralised networking with TGW + RAM, and a centralised security account model for governance at scale.

May 28, 2026 24 min read
Production Deep Dives

Standalone posts on specific production problems, incidents, and the non-obvious solutions that came from operating real systems.

EKS Karpenter Step Functions Shutdown
Stopping EKS Test Environments: Karpenter, Step Functions & the Race Condition Fix

How we coordinated a multi-phase shutdown of EKS worker nodes and Karpenter-provisioned nodes overnight using AWS Step Functions, Lambda, and tag-based node identification.

May 21, 2026 15 min read
GuardDuty false positive investigation in EKS
When GuardDuty Fires on Your Own Engineer: Investigating a False Positive in EKS

kubectl debug creates a privileged pod indistinguishable from a container escape. The Spot node was already gone when investigation started. Here is the full attribution trail — and three changes so it never takes this long again.

May 29, 2026 12 min read
Nginx Ingress to Kubernetes Gateway API migration
From Nginx Ingress to Kubernetes Gateway API: A Production Cutover Story

Removing Nginx Ingress entirely and serving all traffic through the Gateway API on EKS — path inventory, rewrite translation to filters, external/internal Gateway split, and the war stories: a healthy route silently sending login traffic to the wrong app, OIDC, and WAF.

June 17, 2026 14 min read
Real-World Case Studies

Full production platforms and enterprise migrations, end to end — the real constraints, trade-offs, and decisions behind systems running in regulated environments. Where every building block in this series converges.

Fraud Detection Platform on AWS EKS — 33-Microservice Case Study
Designing a Fraud Detection Platform on AWS EKS — A 33-Microservice Architecture Case Study

A real case study: building a bank's fraud platform on EKS — device risk plus behavioural AI/ML, Databricks integration over an S3 boundary, DORA compliance, dual autoscalers, a Fargate detour, and a 40-minute production cutover at the F5.

June 9, 2026 18 min read
Migrating On-Premises Applications to AWS — The 7Rs Playbook
Migrating On-Premises Applications to AWS: The 7Rs, Real Pitfalls & a Working Playbook

A datacentre-exit war story: applying the 7Rs per workload, lift-and-shift with MGN, rebuilding Jenkins, AWS Workspaces, the hidden DNS dependency, replacing a Sophos firewall with Client VPN — and the honest lessons from doing it largely single-handed.

June 11, 2026 16 min read
Learning in Public: Generative AI for Cloud Engineers

Documenting a 24-week journey through Generative AI — from an infrastructure engineer's perspective.

GenAI with LLMs Week 2 — Fine-Tuning, PEFT, LoRA
GenAI Learning · Week 2
Week 2: Generative AI with Large Language Models — Fine-Tuning, PEFT, LoRA & Model Evaluation

Study notes from Week 2 of the AWS & DeepLearning.AI Generative AI with LLMs course — fine-tuning approaches, LoRA, PEFT, catastrophic forgetting, ROUGE, BLEU, and model evaluation benchmarks. Written from a cloud engineer's perspective.

June 2, 2026 15 min read