LIVE atabany.net

// INFRASTRUCTURE DOCUMENTATION

DevOps Proof

This page documents a production-grade Kubernetes cluster built from scratch on bare metal hardware — from flashing a raw OS image to a fully operational GitOps-driven infrastructure stack. Every design decision is explained, every tool is justified, and the architecture is diagrammed in full. This is the same stack I would discuss in any senior DevOps or platform engineering interview.

Talos OS · Bare Metal Kubernetes v1.35 ArgoCD GitOps Helm · Terraform Prometheus · Grafana · Loki Ingress-NGINX · MetalLB · Cert-Manager Sealed Secrets Cloudflare Tunnel (planned)

// 01 · ARCHITECTURE

Full Stack Overview

The cluster runs on an old laptop with an Intel Core i7-7700 and GTX 1050 Ti, repurposed as a dedicated Kubernetes node isolated completely from the MS-01 production Unraid server. The OS is Talos Linux — an immutable, API-only OS with no shell, no SSH daemon, and no package manager. It boots directly into a minimal Linux kernel running only containerd and the Talos API daemon. Every configuration change is a signed API call. This enforces infrastructure-as-code discipline at the OS level.

On top of Talos, a single-node Kubernetes v1.35.2 cluster runs the full application stack. Flannel handles pod networking, MetalLB provides LoadBalancer IP assignment from the LAN range, and Ingress-NGINX routes external traffic by hostname. All workloads are deployed and reconciled continuously by ArgoCD, which watches the GitHub repository as the single source of truth. Nothing is ever applied manually.

Homelab full stack architecture — bare metal to GitOps Seven-layer diagram showing hardware through to ArgoCD applications FULL STACK ARCHITECTURE // BARE METAL TO GITOPS HARDWARE OS K8S NETWORK OBSERV. GITOPS Old Laptop i7-7700 · GTX 1050Ti · SSD 1TB Minisforum MS-01 i9-12900H · 64GB · TerraMaster ~62TB UniFi Network UCG · Switch · AP · VLAN segmented DU Telecom (CGNAT) No inbound ports · outbound-only Talos OS v1.9.x Immutable · no SSH · API-only management Unraid 7 Production media / IaC stack · untouched Windows 11 (PC) kubectl · talosctl · Terraform K8s v1.35.2 Single control-plane node Flannel CNI Pod networking CoreDNS Cluster DNS etcd Cluster state store MetalLB L2 LB IPs Ingress-NGINX Hostname routing · reverse proxy Cert-Manager Auto TLS · Let's Encrypt Sealed Secrets Encrypted secrets in Git Cloudflare Tunnel (planned) Zero-trust public access · no open ports kube-prometheus-stack Prometheus · Alertmanager · kube-state Grafana Dashboards · unified ops view Loki Log aggregation · cluster-wide Homepage Dashboard Auto-discovers all K8s services ArgoCD GitOps controller · continuous sync GitHub (IaC repo) omaratabany/Home-Lab-Infra-as-code Helm (via ArgoCD) All infra deployed as charts Terraform Cloudflare · DNS // Every layer is declarative — config lives in Git, ArgoCD enforces the desired state continuously // Talos has no shell, no SSH — all node management is via signed API calls over mTLS Kubernetes / OS layer Network / GitOps layer Planned / in progress Supporting infrastructure
Diagram 1 of 3: Seven-layer stack from physical hardware to GitOps applications. Green nodes are Kubernetes-native components. Blue nodes are networking and GitOps tooling. Amber nodes are planned or in-progress additions.

// 02 · DESIGN DECISIONS

Why Each Tool Was Chosen

Every tool in the stack was chosen for a specific production reason, not to pad a skills list. The table below explains the decision behind each component.

TOOL PROBLEM IT SOLVES WHY NOT THE ALTERNATIVE
Talos OS Need an OS that enforces IaC discipline at the node level. No shell access means no configuration drift, no manual fixes that never get documented. Ubuntu with Ansible: works, but SSH access is a footgun. A developer can SSH in, fix something manually, and the playbook no longer reflects reality. Talos makes that impossible.
ArgoCD Need a continuous reconciliation loop that keeps cluster state aligned with Git. Manual kubectl applies break down as soon as more than one person or automation touches the cluster. FluxCD is equivalent — chose ArgoCD for the UI and the explicit Application CRD model which makes the desired state auditable per-application rather than just per-directory.
MetalLB Kubernetes LoadBalancer type services require a cloud provider by default. On bare metal there is no cloud to allocate IPs, so pods can only be reached via NodePort or ClusterIP. Pure NodePort works but requires hardcoded high ports everywhere. MetalLB makes LoadBalancer services behave identically to how they would on EKS or GKE — a real IP, standard port 80/443.
Ingress-NGINX Without a reverse proxy, each service needs its own IP and port. Ingress-NGINX allows routing many services through one IP using hostname-based virtual hosting. Traefik is a popular alternative with similar capability. Ingress-NGINX was chosen because it mirrors how most production clusters (including large Dutch enterprises) have historically handled ingress, making it more interview-relevant.
Cert-Manager TLS certificates expire and managing them manually is a production incident waiting to happen. Cert-Manager integrates with Let's Encrypt and renews automatically. Manual certificate management is not acceptable in production. Cert-Manager is the industry standard for automated TLS in Kubernetes and handles DNS-01 and HTTP-01 challenge types.
Sealed Secrets Raw Kubernetes Secrets are base64, not encrypted. Committing them to Git means anyone with repo access can decode them instantly. SOPS with age keys is equally valid (already used for Terraform secrets in the IaC repo). Sealed Secrets was added to demonstrate the Kubernetes-native pattern used at many enterprises.
kube-prometheus-stack Need cluster-wide metrics for CPU, memory, pod health, and deployment status. Flying blind without observability is not a production posture. The full stack (Prometheus Operator + Grafana + kube-state-metrics + Alertmanager) ships as a single Helm chart, pre-wired with all the dashboards and scrape configs needed. Installing components separately would take significantly more configuration time.
Loki Logs from every pod need to be aggregatable without SSHing into nodes. Loki aggregates logs cluster-wide and makes them queryable in Grafana alongside metrics. Elasticsearch (ELK stack) handles this but is significantly heavier on resources for a homelab node. Loki's compressed log storage and tight Grafana integration makes it the right fit here.
Cloudflare Tunnel (planned) DU Telecom in Dubai uses CGNAT — there is no public inbound IP on the home connection. Standard port forwarding is impossible. The cluster cannot be publicly accessible without a solution that works outbound-only. ngrok and similar tools work but are not production-grade for a persistent homelab. Cloudflare Tunnel is already proven on the MS-01 (running onetwork.cc for Plex and other services), so the pattern is known and trusted.

// 03 · GITOPS PIPELINE

From Git Push to Running Pods

The GitOps pipeline enforces a single invariant: Git is the only write path to the cluster. No human runs kubectl apply directly against production. Every change — whether a new application, a configuration update, or an infrastructure addition — is made by editing YAML in the repository and merging a pull request. ArgoCD polls the repository every three minutes and applies any detected drift automatically.

For application deployments, GitHub Actions builds the container image, pushes it to GHCR, and then commits the updated image tag back into the manifest in Git. ArgoCD detects this commit and triggers a sync. The prune flag removes resources that are no longer in Git. The selfHeal flag reverts any manual change made directly to the cluster — enforcing that Git always wins. This is the same reconciliation loop used by teams at ING, Booking.com, and most other Dutch technology companies that have adopted GitOps.

GitOps pipeline — from git push to running pods End-to-end flow diagram from code change to cluster reconciliation GITOPS PIPELINE // GIT PUSH TO RUNNING PODS DEVELOPER git push to main branch GITHUB Repository triggers Actions CI / GITHUB ACTIONS Validate + Build lint · test · docker build GHCR Container Push image:sha pushed CI COMMITS MANIFEST Tag bump in Git deployment.yaml image updated git push (manifest update) → triggers ArgoCD poll ARGOCD (3 min poll) Detects drift desired vs actual state diff ARGOCD SYNC kubectl apply declarative · prune · self-heal K8S API SERVER Schedules pods pulls image from GHCR RESULT Pods Running new image live on cluster PROMETHEUS Metrics scraped pod health visible in Grafana SELF HEAL Manual drift reverted ArgoCD re-applies desired state KEY PRINCIPLE Git is the only write path // No manual kubectl applies in production flow — every change is a Git commit reviewed and merged // ArgoCD prune:true removes orphaned resources · selfHeal:true reverts manual cluster edits
Diagram 2 of 3: End-to-end GitOps pipeline from developer commit to pod running on cluster. The bottom-left box highlights the key principle: Git is the only write path. Self-heal means manual cluster modifications are automatically reverted.
App of Apps pattern
A single root ArgoCD Application watches k8s/apps/ in the repo. Every file in that directory is an ArgoCD Application or ApplicationSet pointing at a Helm chart. Adding a new service is one YAML file and a git push. ArgoCD bootstraps everything else.
ApplicationSet for infrastructure
All six infrastructure tools (MetalLB, Ingress-NGINX, Cert-Manager, Sealed Secrets, kube-prometheus-stack, Loki) are declared in a single infrastructure.yaml ApplicationSet using a list generator. One file, six deployed Helm releases.
Helm via ArgoCD (no local Helm needed)
ArgoCD has Helm built in. The operator machine (Windows PC) does not need Helm installed. Chart versions are pinned in Git. Upgrades are a version bump commit, not a command to remember to run.
Terraform for Cloudflare layer
DNS records, tunnel routes, Zero Trust access policies, and Worker deployments for this CV page are all managed by Terraform with state stored in Cloudflare R2. The cloudflare provider handles everything that lives outside the Kubernetes cluster.

// 04 · TRAFFIC EXPOSURE

How Services Are Accessed: LAN Today, Public Tomorrow

Currently, all cluster services are accessible from devices on the local network only. Ingress-NGINX is assigned 192.168.0.200 via MetalLB L2 advertisement. A hosts file entry on the operator PC maps *.homelab to the node IP, and requests hit the NodePort on :31837 which routes to Ingress-NGINX which routes by hostname to the appropriate pod.

The production upgrade path is Cloudflare Tunnel — the same pattern already running on the MS-01 homelab server behind onetwork.cc. A single cloudflared pod deployed by ArgoCD establishes an outbound-only encrypted tunnel to Cloudflare. Cloudflare Access sits in front of sensitive services (ArgoCD, Grafana, Kubernetes Dashboard) and requires identity verification before any traffic reaches the cluster. No ports are opened on the home router. This is the correct architecture for a residential connection behind DU Telecom CGNAT.

Traffic exposure — current LAN vs planned Cloudflare Tunnel path Two-path diagram showing current nodeport access and planned zero-trust public access TRAFFIC EXPOSURE // LAN TODAY · PUBLIC TOMORROW CURRENT · LAN ONLY Browser PC on LAN :31837 hosts file DNS *.homelab → 192.168.0.134 NodePort :31837 Talos node · 192.168.0.134 Ingress-NGINX routes by hostname Service Pod Grafana · ArgoCD · Homepage · etc Limitation: LAN only · requires hosts file entry per device · no mobile/remote access PLANNED UPGRADE PLANNED · PUBLIC ZERO-TRUST ACCESS (CLOUDFLARE TUNNEL) Internet any device Cloudflare WAF DDoS · bot · rate limit (free tier) Cloudflare Access Identity gate: only you OAuth · OTP · hardware key CF Tunnel outbound-only · no open ports works behind CGNAT (DU) cloudflared pod → Ingress-NGINX same services · same routing rules deployed via ArgoCD as a Helm chart Benefit: no firewall rules · no open ports on home router · CGNAT-safe · Cloudflare handles TLS · access from any device globally // Same pattern already proven on MS-01 (Plex · Overseerr · Jellyfin behind CF Tunnel on onetwork.cc) // Talos cluster will use identical tunnel config — one cloudflared pod deployed by ArgoCD
Diagram 3 of 3: Current LAN access path (top) vs planned Cloudflare Tunnel path (bottom). The tunnel approach requires zero open inbound ports and adds identity-gated access via Cloudflare Access, meaning even if someone knows the URL, they cannot reach the service without authenticated identity.

// 05 · VERIFIABLE EVIDENCE

What Can Be Verified

The following items are directly verifiable by anyone reviewing this profile. This is not a theoretical understanding — the infrastructure is running.

01
IaC repository — public on GitHub
github.com/omaratabany/Home-Lab-Infra-as-code
Contains Terraform modules (Cloudflare DNS, Zero Trust, Tunnel), Ansible roles, Docker Compose stacks, and the full k8s/ directory with ArgoCD Applications, ApplicationSets, Helm values, and ingress manifests. Every tool on this page is represented in the repository as code that can be reviewed, diffed, and applied.
02
Live Grafana panels on the CV site
The CV at atabany.net embeds live Prometheus/Grafana panels from the MS-01 homelab server (scroll to Live observability on the homepage). CPU, memory, disk I/O, and network metrics — real time series. The data path — Node Exporter scraping to Prometheus, Grafana querying internally, Cloudflare Tunnel exposing only Grafana — is the same pattern being extended to the Kubernetes cluster.
03
Talos bare metal installation — documented process
The cluster runs on a laptop with a 1TB NVMe, Talos installed directly without any underlying OS. The installation involved diagnosing and resolving I/O errors during initial disk partitioning, identifying disk assignment conflicts between the NVMe and a USB drive, and resolving PodSecurity namespace policy violations blocking MetalLB speaker. These are the real problems that occur in production bare metal deployments.
04
ArgoCD managing seven active applications
cert-manager, ingress-nginx, sealed-secrets, metallb, kube-prometheus-stack, loki-stack, and homepage are all deployed and reconciled by ArgoCD from the GitHub repository. The ApplicationSet pattern deploys all infrastructure tools from a single YAML file. Adding any new tool to the cluster is a single file addition and a git push.
05
Prometheus stack with custom unified dashboard
kube-prometheus-stack deploys Prometheus Operator, Grafana, kube-state-metrics, and Alertmanager. A custom unified operations dashboard (exported as importable JSON) consolidates critical signals from across the default dashboards into a single view: node status, CPU/memory usage, failing pods, pod restarts by namespace, ArgoCD sync health, and firing alerts.
06
Zero-trust remote access on production MS-01 (already proven)
The MS-01 homelab server has been running Cloudflare Tunnel since 2022 under onetwork.cc. Plex, Overseerr, Jellyseerr, and Grafana are all exposed publicly via Cloudflare Tunnel with Cloudflare Access policies, working correctly behind DU Telecom CGNAT with no open inbound ports. This is the proven pattern that will be replicated for the Kubernetes cluster.