Ian's Blog

Avatar

A RESTful Blog/Homepage.

Avoiding Large Catastrophes.

an article in Technology Research Network describes some research done by the Max Plank institute on how to avoid large failures (meltdowns) of your infrastructre by intentionally taking off some machines off the air before your the entire thing fails. He describes how this couple apply to power grids, and the internet as a whole, but I think it could also apply to a companies site. Most large sites are not just a simple webserver serving requests. They have a myriad of machines all with specialized purposes and shared between different public serving applications. We've seen on some of our stuff how a single query (no i'm not telling which ;-) can bring down other unrelated sites due to them using a shared component right down at the bottom of the food chain. I guess the trick is to find out which machines are actually 'generators' and which are 'transmitters' and trying to shutdown some generators before they take down the transmitters further down the line. update: .. the actual paper is on Arxiv

Category: