Saturday, November 1, 2025

Amazon DynamoDB Outage: A Lesson in Cloud Resilience and Recovery 🚀


On October 19-20, 2025,  AWS experienced a major outage in the Northern Virginia (us-east-1) Region that impacted DynamoDB, EC2, Network Load Balancer, Lambda, and multiple other services used by millions of customers worldwide.


Here’s what went down:

🔹 A rare race condition in DynamoDB’s DNS management caused critical endpoint records to disappear, blocking connections to DynamoDB and triggering widespread failures.

🔹 This DNS failure cascaded into EC2 instance launch failures, network load balancer disruptions, and delays across many dependent services.

🔹 Customers faced increased API errors, latency spikes, and service degradation for over 14 hours.



How AWS responded:

✔️ Rapid identification and manual intervention fixed the DNS state and restored DynamoDB connectivity by early morning.

✔️ Engineers throttled requests, restarted critical subsystems, and brought EC2 and network systems back online gradually.

✔️ Full recovery was achieved over the next several hours, with all services stable by late October 20.

✔️ AWS has disabled the faulty DNS automation, is enhancing testing, and improving fail-safes to prevent similar incidents.


Why this matters:

Cloud infrastructure is incredibly complex—and even the best systems can face hidden bugs with significant impact. What counts is an unwavering commitment to transparency, rapid response, and continuous improvement.

Let’s use this event as a powerful learning moment for all of us in tech.


Thursday, August 1, 2024

DevOps is Not Just an Operations Role: It's a Crucial, Technically-Driven Discipline

 In the evolving landscape of software development, DevOps has emerged as a critical methodology, yet it is often misunderstood. Some may view it as merely an operations role, confined to managing infrastructure or deploying code. However, this perception is far from reality. DevOps is not just about operations; it is a highly technical discipline that requires a deep understanding of both development and operational processes.

The Core of DevOps: Bridging the Gap Between Development and Operations

DevOps is about breaking down the traditional silos between development and operations teams. It’s a culture and a set of practices that bring these teams together, facilitating continuous integration and continuous delivery (CI/CD). This collaboration ensures that software is built, tested, and deployed more efficiently and reliably.

But to achieve this, DevOps professionals need to possess a broad range of technical skills. They must understand coding and scripting, automation tools, cloud platforms, containerization, and monitoring systems. They should be comfortable working with CI/CD pipelines, infrastructure as code (IaC), and microservices architectures.

Sunday, June 20, 2021

Reset macbook touchbar strip

 To reset your macbook's touchbar strip run following commands from terminal

sudo pkill "TouchBarServer"
sudo killall "ControlStrip"

Thursday, June 19, 2014

Setting Network Priority

At work most of us are connected to multiple networks e.g one LAN or WiFi or Some other....
In my case WiFi was on top priority so when connect both LAN and WiFi , WiFi connection was being used which as slower than my LAN.  so I wanted to use WiFi only when my LAN is not working. below are the steps to set priority

1. Open network connections wizard by Goto Start -> Run Type ncpa.cpl and click OK


2.  Press the ALT key, click Advanced Options and then click Advanced Settings






    3.  Select Network and Organize order using up/down buttons 

    4. After organizing the network connections available according to your preferences, click OK.
    5. The computer will now follow an order of priority when detecting available connections.





    Saturday, September 7, 2013

    Automation Testing


    Why Automation Testing is Important - Over the time number of features in any application grows very rapidly while test case doesn't grow with that speed if you don't go with either option1 or option2 (if other please suggest)
    Feature and Test Over Time
    Option 1 # Have more resources for testing, increase test cycle time
    Option 2 # Go for Automation testing to avoid the repetitive work

    I have seen some instances where people feel that Automation testing is killing jobs for manual test engineer, Automation testing is important aspect but it can never replace the manual testing. A test program can be never be as intelligent as human being.

    If you are building an application where customer experience is key, I personally suggest to test the new feature manually first time and use automation to avoid retesting old features again and again.   In software world there are two kind of people one who believe 100% automation, 0% manual testing and one who go with my thoughts.

    Thoughts ?