Surviving Large Scale Internet Failures
K. Kant
Intel Corporation
Although
Internet has so far resisted large scale failures, such outages can happen. The
main objective of this tutorial is to examine the impact of large scale
failures on internet infrastructure, consider appropriate metrics for routing
robustness, and discuss techniques for improving the same.
The
tutorial shall start with an overview of critical Internet infrastructure
elements such as name resolution and routing, and discuss consequences of large
scale failures in these. The tutorial shall also provide an overview of some
techniques for dealing with these failures. From then on, the tutorial shall
focus primarily on inter-AS routing in the Internet and robustness issues of
border gateway protocol (BGP). The topics discussed in some detail include (a) routing
structure in terms of provider-customer relationships and policies. (b) performance
of inter-AS routing under both isolated and large scale failures, and (c)
previous work on improving BGP for isolated failures.
We then
launch into large scale failures from the perspective of what a “large scale
failure” means, what’s important under a large scale failure, how BGP behaves
and what can we do about it. In particular, we demonstrate several techniques
for improving BGP convergence delay and show the improvement via detailed
simulation models. We also consider the critical issue of what are the
appropriate metrics for characterising inter-AS
routing performance in the internet both for isolated and large scale
failures. We show that convergence delay
is not the right metric and show results relative to a few alternative metrics.
Finally, we discuss a variety of open issues on improving robustness of routing
in the Internet.
The tutorial will provide the attendee an overview of impact of large scale failures on the Internet and a through understanding of how the inter-domain routing will be affected by it and what can we do about it.
NOTE: This tutorial is based on joint work with Prasant Mohapatra, UC/Davis and Amit Sahoo, Cisco. For further information, please contact krishna.kant@intel.com