Netflix is Analyzing Power Outages and Looking to Hire Experienced Engineers
Following a widespread power outage that took out Netflix, Instagram and Pinterest in late June, Netflix launched their own investigation into its possible causes, only to discover several shortcomings on their end as well.
Several days after the outage, Amazon released their own explanation as to why the incident occurred, revealing that it all started with a power failure that spread to the backup generators, and created a domino effect that eventually knocked out their load-balancing hardware.
Netflix engineers Greg Orzell and Ariel Tseitlin revealed that internet streaming media provider had their own problems with their load-balancing service. "This caused unhealthy instances to fail to deregister from the load-balancer which black-holed a large amount of traffic into the unavailable zone" the two said in an official statement.
Despite several power failures, Netflix stands by the decision to use cloud-services. The company said that each failure has helped them become more resilient to potential outages, but it has also enabled them to optimize internal processes.
Greg Orzell and Ariel Tseitlin remain confident that the choice to move to cloud-services was a sound one, in spite of the problems that may periodically occur. "While it's easy and common to blame the cloud for outages because itís outside of our control, we found that our overall availability over the past several years has steadily improved", concluded the two.
Moreover, Netflix is looking to improve their could management strategy over the coming years, so as to gradually eliminate any internal factors potentially contributing to outages. The company has announced that they are currently looking to hire experienced engineers for their Cloud Operations and Reliability Engineering teams.
Although they switched to cloud-based services in late 2010, Netflix has already put together an impressive team of engineers whose primary focus is the optimization of internal processes that affect their load-balancing capabilities.
Tseitlin tweeted earlier this month that Netflix is still "actively beefing up our reliability engineering team... always looking for good people". Interested individuals can submit their resumes on the company's website.