DNS Outage Post-Mortem
February 4, 2013 | Kevin Meinert
On Monday February 4th, at 3:01 PM PST NetDNA’s CDN experienced a service impacting DNS outage, where customer CDN DNS records were not returning correct IP’s. NetDNA’s Operations Team identified the problem immediately and restored a backup on the DNS server within 15 minutes of the alerted outage. The DNS records took several more minutes to replicate across the network.
A new, young and talented DevOps member of the team was working on a custom application to better manage traffic routing patterns for a particular customer. This application requires modifying DNS records. This was not maintenance that should have impacted any customers yet the app contained an improper SQL command which improperly modified customer records.
How we will prevent this in the future:
We are in process of enhancing our automated and manual application testing procedures to include more limits to engineer SQL permissions to prevent large numbers of records from being unintentionally modified. Only after custom applications have been thoroughly unit-tested & reviewed by senior team members will they be approved and deployed in a production environment.
We are terribly sorry about the downtime that this error has caused you on Monday. This is not the performance we have shown you to expect from NetDNA, and this is not the standards we have set for ourselves. We learn from every performance impacting issue we experience, we are constantly getting better, but I know you expect more and so do we. We can and will do better.
We also keep our Status page at status.netdna.com updated at times like these when there are CDN performance degradation or outages. You can follow the status blog here: http://status.netdna.com, on Twitter or via RSS.
Thank you for your continued support as we grow our service and offering.
VP of Operations