Surviving Large Scale Internet Failures

K. Kant

Intel Corporation

 

Although Internet has so far resisted large scale failures, such outages can happen. The main objective of this tutorial is to examine the impact of large scale failures on internet infrastructure, consider appropriate metrics for routing robustness, and discuss techniques for improving the same. 

 

The tutorial shall start with an overview of critical Internet infrastructure elements such as name resolution and routing, and discuss consequences of large scale failures in these. The tutorial shall also provide an overview of some techniques for dealing with these failures. From then on, the tutorial shall focus primarily on inter-AS routing in the Internet and robustness issues of border gateway protocol (BGP). The topics discussed in some detail include (a) routing structure in terms of provider-customer relationships and policies. (b) performance of inter-AS routing under both isolated and large scale failures, and (c) previous work on improving BGP for isolated failures.   

 

We then launch into large scale failures from the perspective of what a “large scale failure” means, what’s important under a large scale failure, how BGP behaves and what can we do about it. In particular, we demonstrate several techniques for improving BGP convergence delay and show the improvement via detailed simulation models. We also consider the critical issue of what are the appropriate metrics for characterising inter-AS routing performance in the internet both for isolated and large scale failures.  We show that convergence delay is not the right metric and show results relative to a few alternative metrics. Finally, we discuss a variety of open issues on improving robustness of routing in the Internet. 

 

The tutorial will provide the attendee an overview of impact of large scale failures on the Internet and a through understanding of how the inter-domain routing will be affected by it and what can we do about it.

 

Here is the presentation.

 

NOTE:  This tutorial is based on joint work with Prasant Mohapatra, UC/Davis and Amit Sahoo, Cisco. For further information, please contact krishna.kant@intel.com