Deploy a full-stack AI task manager using OpenAI embeddings, semantic search, pgvector, FastAPI, React, and PostgreSQL on Render's free tier in under an hour.
Technical Musings
Building a Scalable Kubernetes Application on DigitalOcean
A hands-on Flask demonstration project that shows real-time pod and node assignments during scaling events.
Deploying an E-commerce Platform on Amazon EKS with GitOps
Steps to deploy microservices on an Amazon EKS cluster using Terraform, ingress configuration, and Argo CD for GitOps.
Setting Up CI/CD Pipelines with GitHub Actions and Argo CD
Building automated CI/CD pipelines using GitHub Actions for continuous integration and Argo CD for continuous deployment.
Terraform Best Practices for AWS Infrastructure
Best practices for provisioning AWS infrastructure with Terraform, focusing on modular design, remote backends, and state locking.
Containerizing Microservices with Docker
Guide to containerizing Golang, Java, and Python microservices using Docker with Distroless images and multi-stage builds.
Building Resilient Systems Through Strategic Testing and Chaos Engineering
Overview of testing methodologies and Chaos Engineering to validate system resilience.
Introduction to High Availability Concepts
A beginner’s guide to high availability concepts, covering failover, redundancy, and monitoring to build reliable systems with minimal downtime.
Service Level Management and Error Budgeting
A guide to using SLOs, SLIs, SLAs, and error budgets to manage system reliability.
Agile Methodologies in System Operations: Enhancing Collaboration and Efficiency
An exploration of how Agile principles can improve collaboration and efficiency in system operations and infrastructure management.
Incident Management and Escalation Handling: Keeping Systems Reliable
A guide to structured incident management and escalation for SRE and DevOps teams, with a focus on reliability and best practices.
Monitoring CPU Usage with Python for System Reliability
A Python script to monitor CPU usage and send email alerts when it exceeds a threshold, ensuring system health.
Managing Infrastructure with Ansible
A step-by-step guide to setting up an Ansible project for managing infrastructure.
Automating a simple CI/CD Pipeline with GitHub Actions
A step-by-step guide to setting up a CI/CD pipeline for a Node.js app using GitHub Actions and Docker.
A Simple AWS Setup with Terraform
A step-by-step guide to deploying a VPC, EC2, and S3 on AWS using Terraform.
From Command Line to Observability: The Evolution of System Introspection
A follow-up to my earlier analysis of the ls *c command on Medium.
Understanding File Systems in Modern Infrastructure: Beyond Symlinks and Hardlinks
A follow-up to my earlier exploration of symlinks and hardlinks on Medium
SRE Foundations to Production: Securing Postgres on K8s (Part 7)
Secure Postgres with Secrets, resolve DB hiccups, and confirm USE and RED metrics with Prometheus in a Flask app on Kubernetes.
SRE Foundations to Production: Advanced Monitoring Setup (Part 6)
Run two Flask instances with PostgreSQL, monitor USE/RED metrics via Prometheus and Grafana, and load test with Locust to push reliability.
SRE Foundations to Production: Scaling and Load Testing (Part 5)
Scale your Flask app with Locust load testing, monitor CPU and memory with Prometheus, and run multiple instances for production-ready reliability.
SRE Foundations to Production: SQLite, Loki, and Dashboards (Part 4)
Enhance your Flask app’s observability with SQLite persistence, Loki log aggregation, and advanced Grafana dashboards in this SRE-focused guide.
SRE Foundations to Production: Alerting and EC2 Deployment (Part 3)
Enhance your Flask app with Prometheus alerting, Alertmanager notifications, and deploy it to AWS EC2 for a production-like setup.
SRE Foundations to Production: Grafana Metrics (Part 2)
Build on your Flask app’s monitoring by adding Grafana dashboards to visualize Prometheus metrics and enhance reliability.
SRE Foundations to Production: Monitorable Flask App Setup (Part 1)
Turn a Flask app into a monitorable system with tests, logs, Docker, and Prometheus for SRE reliability.