EyeQ: Network Performance Isolation for the Datacenter ====================================================== What it is ---------- EyeQ is a distributed transport layer for your datacenter, which provides predictable transmit _and_ receive bandwidth guarantees (over few ms) to VMs/services on a server. Motivation: Why are we doing this? ---------------------------------- Multi-tenant environments are shared. Sharing raises concerns about how performance predictability for CPU, memory, disk and network bandwidth. Multi-tenant could mean multiple services like MapReduce / Search / Storage sharing the same infrastructure (e.g. github, Twitter, Facebook, etc.), and different customers in the context of a public cloud (e.g. Amazon AWS, Windows Azure, etc.). It is important that tenants are isolated from each other so that the network activity of one tenant does not adversely impact other tenants sharing the same insfrastructure. More importantly, our goal is to provide each VM of a tenant with a bandwidth assurance tenants can _understand_. How it works ------------ EyeQ works end-to-end and uses rate limiters to allocate rates to flows such that traffic classes meet their guarantees. These rate limiters self-program (using congestion control) and adjust their rates dynamically as flows come and go, so that it doesn't violate other VM's rate guarantees. What's special -------------- EyeQ requires no per-flow or per-tenant CoS queues in the network. You don't need to touch configuration in 100s of network devices as you provision new VMs or services. EyeQ is designed to operate at high speeds (10Gb/s and beyond). The core components of EyeQ, the rate limiters and congestion detectors, are optimised to incur low CPU overhead and low latency at high line rates. These rate limiters work with multiqueue network devices and outperform Linux's rate limiters (htb, tbf, hfsc, etc.). EyeQ does not place trust on a VM. It works irrespective of VM's transport, be it TCP Reno, BIC, CUBIC, etc. or UDP. You can now safely allow UDP traffic to operate on your network. The rate control in EyeQ is responsive to sudden traffic bursts and converges 50 times faster than TCP (few ms, instead of 100s of ms). More links ---------- * Paper with full design and evaluation to appear in NSDI 2013 http://www.stanford.edu/~jvimal/EyeQ-NSDI13.pdf * Talk/slides at NSDI 2013 https://www.usenix.org/conference/nsdi13/eyeq-practical-network-performance-isolation-edge * Early workshop paper at HotCloud 2012 https://www.usenix.org/system/files/conference/hotcloud12/hotcloud12-final38.pdf * Talk and slides at HotCloud 2012 https://www.usenix.org/conference/hotcloud12/eyeq-practical-network-performance-isolation-multi-tenant-cloud People ------ Stanford University * Vimalkumar Jeyakumar (or, just Vimal) http://www.stanford.edu/~jvimal * Mohammad Alizadeh http://www.stanford.edu/~alizade * Prof. David Mazieres http://www.scs.stanford.edu/~dm * Prof. Balaji Prabhakar http://www.stanford.edu/~balaji Collaborations: (Windows Azure) * Changhoon Kim * Albert Greenberg Why the name EyeQ? ------------------ No one has asked me this, but this is just for the record. :-) EyeQ stands for "An Eye for Quality." Moreover, I just realized that EyeQ rhymes with IQ which is usually used to denote Input Queued switches.