Abstractions

At the heart of programmable data planes lies the question of which abstractions and programming interfaces to provide. We first review literature on low-level APIs, including OpenFlow and P4, and then discuss more high-level languages and compilers, including DevoFlow and the Frenetic framework. Particular focus is put on stateful abstractions. We then extend our review to literature on parser design as well as scheduling, and in particular, the question whether there exist universal packet scheduling algorithms.

Languages and Compilers

We start our survey with the seminal paper on OpenFlow, introducing a standardized interface to manage flow table entries in data plane devices via a standard control-plane–data-plane API. We then proceed by discussing P4 and its alternatives and use cases. We also review, among other, high-performance packet processing languages and make the case for intermediate representations for programmable data planes. We then proceed to more high-level languages and abstractions, discuss programming languages such as Pyretic, systems such as Maple, or novel switch designs like HARMLESS to seamlessly add SDN capability to legacy network gear.

Low-level APIs

OpenFlow: Enabling Innovation in Campus Networks Nick McKeown, Tom Anderson, Hari Balakrishnan, Guru Parulkar, Larry Peterson, Jennifer Rexford, Scott Shenker, Jonathan Turner — ACM SIGCOMM CCR 28,2 (2008) The OpenFlow whitepaper. The original idea in OpenFlow was to provide a way for researchers to run experimental protocols in the networks they use every day. OpenFlow is based on an Ethernet switch, with an internal flow-table, and a standardized interface to add and remove flow entries. This allowed, in addition to allowing researchers to evaluate their ideas in real-world traffic settings, for OpenFlow to serve as a useful campus component in large-scale testbeds.

packetC: Language for High Performance Packet Processing Ralph Duncan, Peder Jungck — IEEE HPCC '09 (2009) This paper describes a model which combines a parallel model and heterogeneous multiprocessor implementations. The parallel packet processing model uses coarse-grained SPMD parallelism to free users from thread management and it requires the host system to locate protocol headers in the packet before a parallel copy of the program executes. The packetC language abstracts and encapsulates familiar packet processing data sets and operations into new aggregate data types and operators, e.g., for packets, databases and searchsets.

Protocol-oblivious Forwarding: Unleash the Power of SDN Through a Future-proof Forwarding Plane Haoyu Song — ACM HotSDN '13 (2013) As an alternative to P4, Protocol-Oblivious Forwarding (POF) is presented as a key enabler for highly flexible and programmable SDN. The goal is to remove any dependency on protocol-specific configurations on the forwarding elements and, in addition to P4's stateless design, enhance the data-path with new stateful instructions to support genuine software defined networking behavior. A generic flow instruction set (FIS) is defined to fulfill this purpose and both hardware-based and open source software-based prototypes are shown to demonstrate the feasibility and advantages of POF.

P4: Programming protocol-independent packet processors Pat Bosshart, Dan Daly, Glen Gibb, Martin Izzard, Nick McKeown, Jennifer Rexford, Cole Schlesinger, Dan Talayco, Amin Vahdat, George Varghese, David Walker — ACM SIGCOMM CCR 44,3 (2014) This seminal paper introduces P4, a high-level language for programming protocol-independent packet processors. P4 has three goals: (1) reconfigurability, in that programmers can change the way switches process packets once they are deployed, (2) protocol independence, in that switches are not tied to any specific network protocols, and (3) target independence, in that programmers can describe packet-processing functionality independently of the specifics of the underlying hardware. The paper demonstrates P4 by showing configure a switch to add a new hierarchical label.

DC.P4: Programming the Forwarding Plane of a Data-center Switch Anirudh Sivaraman, Changhoon Kim, Ramkumar Krishnamoorthy, Advait Dixit, Mihai Budiu — ACM SOSR '15 (2015) The paper presents a case study that uses P4 to express the forwarding behavior of a datacenter switch's data plane. In the process, it identifies issues and strengths of P4. Some of the lessons learned had, and are having, an impact on the language evolution. For instance, and most notably, the language-architecture separation that has been implemented in newer versions of P4.

The Case for an Intermediate Representation for Programmable Data Planes Muhammad Shahbaz, Nick Feamster — ACM SOSR '15 (2015) The paper introduces NetASM, an intermediate representation for programmable data planes. NetASM is a device-independent language that is expressive enough to act as the target language for compilers for high-level languages, yet low-level enough to be efficiently assembled on various device architectures. It enables conventional compiler optimization techniques to significantly improve the performance and resource utilization of custom packet-processing pipelines on a variety of targets.

Improving SDN with InSPired Switches Roberto Bifulco, Julien Boite, Mathieu Bouet, Fabian Schneider — ACM SOSR '16 (2016) The paper proposes an API for programming the generation of packets in programmable switches, instead of forging network packets on the controller side. The InSP API allows a programmer to define in-switch packet generation operations, which include the specification of triggering conditions, packet's content and forwarding actions.

PVPP: A Programmable Vector Packet Processor Sean Choi, Xiang Long, Muhammad Shahbaz, Skip Booth, Andy Keep, John Marshall, Changhoon Kim — ACM SOSR '17 (2017) PVPP is a data-plane program compiler from P4, a data plane DSL based on match-action tables, to the fd.io Vector Packet Processor (VPP) software switch, based on the packet processing node graph model. PVPP compiles a data plane program written in P4 to VPP's internal graph representation.

High-level Languages and Compilers

DevoFlow: scaling flow management for high-performance networks Andrew R. Curtis, Jeffrey C. Mogul, Jean Tourrilhes, Praveen Yalagandula, Puneet Sharma, Sujata Banerjee — ACM SIGCOMM '11 (2011) This paper is motivated by the observation that OpenFlow, in its original design, imposes great overheads, involving the switch’s control-plane too often. In order to meet the needs of high-performance networks, the authors propose and evaluate DevoFlow, which provides less fine-grained visibility, at significantly lower costs. In a case study, the authors show that DevoFlow can load-balance data center traffic as well as fine-grained solutions, but with much fewer flow table entries and using much fewer control messages.

Composing Software Defined Networks Christopher Monsanto, Joshua Reich, Nate Foster, Jennifer Rexford, David Walker — USENIX NSDI '13 (2013) The paper introduces Pyretic, a novel programming language for writing composable SDN applications using a set of high level topology and packet-processing abstractions. Pyretic improves on Frenetic (an earlier incarnation of a similar language) by adding support for sequential composition, the use of topology abstractions to define what each module can see and do with the network, and an abstract packet model that introduces virtual fields into packets. Modular applications are written using the static policy language NetCore, which provides primitive actions, matching predicates, query policies, and policies.

Maple: simplifying SDN programming using algorithmic policies Andreas Voellmy, Junchang Wang, Y Richard Yang, Bryan Ford, Paul Hudak — ACM SIGCOMM '13 (2013) The paper presents Maple, a system that simplifies SDN programming by (1) allowing a programmer to use a standard programming language to design an arbitrary, centralized algorithm, to decide the behavior of an entire network, and (2) providing an abstraction that the programmer-defined, centralized policy runs on every packet entering a network, and hence is oblivious to the challenge of translating a high-level policy into sets of rules on distributed individual switches. To implement algorithmic policies efficiently, Maple includes not only a highly-efficient multicore scheduler, but more importantly a novel tracing runtime optimizer that can automatically record reusable policy decisions, offload work to switches when possible, and keep switch flow tables up-to-date by dynamically tracing the dependency of policy decisions on packet contents as well as the environment.

Languages for software-defined networks Nate Foster, Arjun Guha, Mark Reitblatt, Alec Story, Michael J. Freedman, Naga Praveen Katta, Christopher Monsanto, Joshua Reich, Jennifer Rexford, Cole Schlesinger, David Walker, Robert Harrison — IEEE Communications 51,2 (2013) An easily approachable survey on higher-level abstractions for creating and composing packet processing applications using the Frenetic framework.

NetKAT: Semantic Foundations for Networks Carolyn Jane Anderson, Nate Foster, Arjun Guha, Jean-Baptiste Jeannin, Dexter Kozen, Cole Schlesinger, David Walker — ACM POPL '14 (2014) The paper contributes just what is stated in the title, a new network programming language called NetKAT that is based on a solid mathematical foundation. Formerly, the design of network dataplane programming languages was largely ad hoc, driven more by the needs of applications and the capabilities of network hardware than by foundational principles. NetKAT solves this problem by (1) proposing primitives for filtering, modifying, and transmitting packets, operators for combining programs in parallel and in sequence, and a Kleene star operator for iteration, and (2) presenting a series of proofs that the language is sound and complete.

A Purely Functional Approach to Packet Processing Nicola Bonelli, Stefano Giordano, Gregorio Procissi, Luca Abeni — ACM/IEEE ANCS '14 (2014) The paper introduces PFQ-Lang, an extensible functional language to process, analyze and forward packets, which allows easy development by leveraging functional composition and allows to exploit multi-queue NICs and multi-core architectures.

Reclaiming the Brain: Useful OpenFlow Functions in the Data Plane Liron Schiff, Michael Borokhovich, Stefan Schmid — ACM HotNets '14 (2014) Schiff et al. show that standard OpenFlow can be exploited to implement powerful functionality in the data plane, e.g., to reduce the number of interactions with the control plane or to render the network more robust. Example applications of such a SmartSouth include topology snapshot, anycast, blackhole detection and critical node detection.

Compiling Packet Programs to Reconfigurable Switches Lavanya Jose, Lisa Yan, George Varghese, Nick McKeown — USENIX NSDI '15 (2015) Seminal paper exploring the design of a compiler for programmable switching chips, in particular how to map logical lookup tables to physical tables while meeting data and control dependencies in the program. A Integer Linear Programming (ILP) and greedy approach is presented to generate solutions optimized for latency, pipeline occupancy, or power consumption. The authors show benchmarks from real production networks to two different programmable switch architectures: RMT and Intel’s FlexPipe.

VFP: A Virtual Switch Platform for Host SDN in the Public Cloud Daniel Firestone — USENIX NSDI '17 (2017) The paper presents the Virtual Filtering Platform (VFP), a programmable virtual switch that powers Microsoft Azure, a large public cloud. VFP includes support for multiple independent network controllers, policy based on connections rather than only on packets, efficient caching and classification algorithms for performance, and efficient offload of flow policy to programmable NICs. The paper presents the design of VFP and its API, its flow language and compiler used for flow processing, performance results, and experiences deploying and using VFP in Azure over several years.

P4FPGA: A Rapid Prototyping Framework for P4 Han Wang, Robert Soulé, Huynh Tu Dang, Ki Suh Lee, Vishal Shrivastav, Nate Foster, Hakim Weatherspoon — ACM SOSR '17 (2017) P4FPGA is a tool for developing and evaluating data plane applications. It is both an open-source compiler and runtime; the compiler in turn extends the P4.org reference compiler with a custom backend that generates FPGA code. By combining high-level programming abstractions offered by P4 with a flexible and powerful hardware target, P4FPGA may allow developers to rapidly prototype and deploy new data plane applications.

HARMLESS: Cost-Effective Transitioning to SDN for Small Enterprises Levente Csikor, László Toka, Márk Szalay, Gergely Pongrácz, Dimitrios P. Pezaros, Gábor Rétvári — IFIP Networking '18 (2018) The paper proposes HARMLESS, a new SDN switch design that seamlessly adds SDN capability to legacy network gear, by emulating the OpenFlow switch OS in a separate software switch component. This way, HARMLESS enables a quick and easy leap into SDN, combining the rapid innovation and upgrade cycles of software switches with the port density and cost-efficiency of hardware-based appliances into a fully dataplane-transparent and vendor-neutral solution. HARMLESS incurs an order of magnitude smaller initial expenditure for an SDN deployment than existing turnkey vendor SDN solutions while it yields matching, or even better data plane performance for smaller enterprises.

Dataplane equivalence and its applications Dragos Dumitrescu, Radu Stoenescu, Matei Popovici, Lorina Negreanu, Costin Raiciu — USENIX NSDI '19 (2019) The paper presents netdiff, an algorithm to check the equivalence of two network dataplanes. Such an algorithm can be useful to verify and compare the output of different dataplane compilers, to find new bugs existing network dataplanes, or to check the equivalence of FIB updates in a production network. The evaluation shows that equivalence is an easy way to find bugs, scales well to relatively large programs and uncovers subtle issues otherwise difficult to find.

Abstractions for Embedded State

While OpenFlow match/action table abstractions are stateless, there are many efforts toward devising a stateful data plane programming abstraction, e.g., based on finite state machines, for supporting more dynamic applications. We discuss such approaches as well as first workload characterizations of stateful networking applications. We also review literature on the challenge of consistent state migration and elastic scaling, and discuss security implications.

Workload Characterization of Stateful Networking Applications Javier Verdú, Mario Nemirovsky, Jorge García, Mateo Valero — IEEE ISHPC '08 (2008) This paper presents the first workload characterization of stateful networking applications. The analysis emphasizes the study of data cache behavior, but discusses branch prediction, instruction distribution, etc. Another important contribution is the study of the state categories of different networking applications.

OpenState: Programming Platform-independent Stateful Openflow Applications Inside the Switch Giuseppe Bianchi, Marco Bonola, Antonio Capone, Carmelo Cascone — ACM SIGCOMM CCR 44,2 (2014) The paper tackles the challenge to devise a stateful data plane programming abstraction (versus the stateless OpenFlow match/action table abstraction) which still entails high performance and remains consistent with vendors' preference for closed platforms. The authors posit that a promising answer revolves around the usage of extended finite state machines, as an extension (super-set) of the OpenFlow match/action abstraction, turn the proposed abstraction into an actual table-based API, and show how it can be supported by (mostly) reusing core primitives already implemented in OpenFlow devices.

Flow-level State Transition As a New Switch Primitive for SDN Masoud Moshref, Apoorv Bhargava, Adhip Gupta, Minlan Yu, Ramesh Govindan — ACM HotSDN '14 (2014) The paper proposes FAST (Flow-level State Transitions) as a new switch primitive for software-defined networks. With FAST, the controller simply preinstalls a state machine and switches can automatically record flow state transitions by matching incoming packets to installed filters. FAST can support a variety of dynamic applications, and can be readily implemented with commodity switch components and software switches.

SNAP: Stateful Network-Wide Abstractions for Packet Processing Mina Tahmasbi Arashloo, Yaron Koral, Michael Greenberg, Jennifer Rexford, David Walker — ACM SIGCOMM '16 (2016) SNAP offers a simpler "centralized" stateful programming model on top of the simple match-action paradigm offered by OpenFlow. SNAP programs are developed on a one-big-switch abstraction and may contain reads and writes to global, persistent arrays, allowing programmers to implement a broad range of stateful applications. The SNAP compiler then distributes, places, and optimizes access to these stateful arrays, discovering read/write dependencies and translating one-big-switch programs into an efficient internal representation based on a novel variant of binary decision diagrams.

Kinetic: Verifiable Dynamic Network Control Hyojoon Kim, Joshua Reich, Arpit Gupta, Muhammad Shahbaz, Nick Feamster, Russ Clark — USENIX NSDI '15 (2015) Kinetic provides a formal way to program the network control plane using finite state machines. The use of a formal language allows the system to verify the correctness of the control program according to user-specified temporal properties. The paper also reports about a user survey among students of the Coursera's SDN course, which find the Finite State Machine abstraction of Kinetic to be intuitive and easier to verify compared to other high-level languages, such as Pyretic.

Packet Transactions: High-Level Programming for Line-Rate Switches Anirudh Sivaraman, Alvin Cheung, Mihai Budiu, Changhoon Kim, Mohammad Alizadeh, Hari Balakrishnan, George Varghese, Nick McKeown, Steve Licking — ACM SIGCOMM '16 (2016) This paper shows how to program data-plane algorithms in a high-level language and compile those programs into low-level microcode that can run on programmable line-rate switching chips. The key challenge is that many data-plane algorithms create and modify algorithmic state. To achieve line-rate programmability for stateful algorithms, the paper introduces the notion of a packet transaction: a sequential packet-processing code block that is atomic and isolated from other such code blocks. The idea is developed in Domino, a C-like imperative language to express data-plane algorithms, and many examples are shown that can be run at line rate with modest estimated chip-area overhead.

Open Packet Processor: a programmable architecture for wire speed platform-independent stateful in-network processing Giuseppe Bianchi, Marco Bonola, Salvatore Pontarelli, Davide Sanvito, Antonio Capone, Carmelo Cascone — unpublished manuscript (2016) This paper aims at contributing to the debate on how to bring programmability of stateful packet processing tasks inside the network switches, while retaining platform independence. The proposed approach, named "Open Packet Processor" (OPP), shows the viability of eXtended Finite State Machines (XFSM) as low-level data plane programming abstraction. Platform independence is accomplished by decoupling the implementation of hardware primitives from their usage by an application formally described via an abstract XFSM.

Paving the Way for NFV: Simplifying Middlebox Modifications Using StateAlyzr Junaid Khalid, Aaron Gember-Jacobson, Roney Michael, Anubhavnidhi Abhashkumar,Aditya Akella — USENIX NSDI '16 (2016) Migrating/cloning internal state in elastically scalable Network Functions Virtualization (NFV) require modifications to middlebox code to identify needed state. The paper presents a framework-independent system, StateAlyzr, that embodies novel algorithms adapted from program analysis to provably and automatically identify all state that must be migrated/cloned to ensure consistent middlebox output in the face of redistribution. StateAlyzr reduces man-hours required for code modification by nearly 20x.

Swing State: Consistent Updates for Stateful and Programmable Data Planes Shouxi Luo, Hongfang Yu, Laurent Vanbever — ACM SOSR '17 (2017) The paper presents Swing State, a general state-management framework and runtime system supporting consistent state migration in stateful data planes. The key insight is to perform state migration entirely within the data plane by piggybacking state updates on live traffic. To minimize the overhead, Swing State only migrates the states that cannot be safely reconstructed at the destination switch. A prototype of Swing State for P4 is also described.

A Survey on the Security of Stateful SDN Data Planes Tooska Dargahi, Alberto Caponi, Moreno Ambrosin, Giuseppe Bianchi, Mauro Conti — IEEE Comm. Surveys & Tutorials 19,3 (2017) The paper provides the reader with a background on stateful SDN data plane proposals, focusing on the security implications that data plane programmability brings about, identifies potential attack scenarios, and highlights possible vulnerabilities specific to stateful in-switch processing, including denial of service and saturation attacks.

Stateless Network Functions: Breaking the Tight Coupling of State and Processing Murad Kablan, Azzam Alsudais, Eric Keller, Franck Le — USENIX NSDI '17 (2017) The paper presents Stateless Network Functions, a new architecture for network functions virtualization, where the existing design of network functions is decomposed into a stateless processing component along with a data-store layer. The StatelessNF processing instances are architected around efficient pipelines utilizing DPDK for high performance network I/O, packaged as Docker containers for easy deployment, and a data store interface optimized based on the expected request patterns to efficiently access a RAMCloud-based data store. A network-wide orchestrator monitors the instances for load and failure, manages instances to scale and provide resilience, and leverages an OpenFlow-based network to direct traffic to instances.

Elastic Scaling of Stateful Network Functions Shinae Woo, Justine Sherry, Sangjin Han, Sue Moon, Sylvia Ratnasamy, Scott Shenker — USENIX NSDI '18 (2018) Elastic scaling is a central promise of NFV but has been hard to realize in practice, because most Network Functions (NFs) are stateful and this state need to be shared across NF instances. The paper presents S6, building on the insight that a distributed shared state abstraction is well-suited to the NFV context. State is organized as a distributed shared object (DSO) space, extended with techniques designed to meet the need for elasticity and high-performance in NFV workloads.

FlowBlaze: Stateful Packet Processing in Hardware Salvatore Pontarelli, Roberto Bifulco, Marco Bonola, Carmelo Cascone, Marco Spaziani, Valerio Bruschi, Davide Sanvito, Giuseppe Siracusano, Antonio Capone, Michio Honda, Felipe Huici, Giuseppe Bianchi — USENIX NSDI '19 (2019) FlowBlaze is an open abstraction for building stateful packet processing functions in hardware. The abstraction is based on Extended Finite State Machines and introduces the explicit definition of flow state, allowing FlowBlaze to leverage flow-level parallelism. The paper presents an implementation of FlowBlaze on a NetFPGA SmartNIC that achieves latency on the order of a few microseconds, consumes relatively little power, can hold per-flow state for hundreds of thousands of flows, and yields speeds of 40 Gbps.

Programmable Parsing and Scheduling

Design principles for packet parsers Glen Gibb, George Varghese, Mark Horowitz, Nick McKeown — ACM/IEEE ANCS '13 (2013) The paper presents an interesting view on parser design and the trade-offs between different designs, asking whether it is better to design one fast parser or several slow parsers, what are the costs of making the parser reconfigurable in the field, and what design decisions most impact power and area. The paper describes trade-offs in parser design, identifies design principles for switch and router architects, and describes a parser generator that outputs synthesizable Verilog that is available for download.

No Silver Bullet: Extending SDN to the Data Plane Anirudh Sivaraman, Keith Winstein, Suvinay Subramanian, Hari Balakrishnan — ACM HotNets '13 (2013) The authors argue that, instead of going with a universal scheduler that would handle all queuing strategies that may arise in a programmable switch, Software-Defined Networking must be extended to control the fast-path scheduling and queuing behavior of a switch. To this end, they propose adding a small FPGA to switches, and synthesize, place, and route hardware implementations for CoDel and RED.

Universal Packet Scheduling Radhika Mittal, Rachit Agarwal, Sylvia Ratnasamy, Scott Shenker — USENIX NSDI '16 (2016) The addresses a seemingly simple question: Is there a universal packet scheduling algorithm? It turns out that in general the answer is "no"; however, the authors manage to show that the classical Least Slack Time First (LSTF) scheduling algorithm comes closest to being universal and it can closely replay a wide range of scheduling algorithms. LSTF is evaluated as to whether in practice it can meet various network-wide objectives; the authors find that LSTF performs comparable to the state-of-the-art for each of performance metric.

Programmable Packet Scheduling at Line Rate Anirudh Sivaraman, Suvinay Subramanian, Mohammad Alizadeh, Sharad Chole, Shang-Tse Chuang, Anurag Agrawal, Hari Balakrishnan, Tom Edsall, Sachin Katti, Nick McKeown — ACM SIGCOMM '16 (2016) Similarly to the "Universal Packet Scheduling" paper, this paper presents another design for a programmable packet scheduler, which allows scheduling algorithms, potentially algorithms that are unknown today, to be programmed into a switch without requiring hardware redesign. The design uses the property that scheduling algorithms make two decisions, in what order to schedule packets and when to schedule them, and exploits that in many scheduling algorithms definitive decisions on these two questions can be made when packets are enqueued. The resultant design uses a single abstraction: the push-in first-out queue (PIFO), a priority queue that maintains the scheduling order or time.

Approximating Fair Queueing on Reconfigurable Switches Naveen Kr. Sharma, Ming Liu, Kishore Atreya, Arvind Krishnamurthy — USENIX NSDI '18 (2018) The paper discusses how to leverage configurable per-packet processing and the ability to maintain mutable state inside switches to achieve fair bandwidth allocation across all traversing flows. The problem is that implementing fair queuing mechanisms in high-speed switches is expensive, since they require complex flow classification, buffer allocation, and scheduling on a per-packet basis. The proposed dequeuing scheduler, called Rotating Strict Priority scheduler, simulates an ideal round-robin scheme where each active flow transmits a single bit of data in every round, which allows to transmit packets from multiple queues in approximately sorted order.

Fast, Scalable, and Programmable Packet Scheduler in Hardware Vishal Shrivastav — ACM SIGCOMM '19 (2019) The trend toward increasing link speeds and slowdown in the scaling of CPU speeds, leads to a situation where packet scheduling in software results in lower precision and higher CPU utilization. While by offloading packet scheduling to the hardware (e.g., the NIC), this drawback can be overcome, one still would like to retain the flexibility benefits of software packet schedulers: packet scheduling in hardware should hence be programmable. Shrivastav proposes a generalization of the Push-In-First-Out (PIFO) primitive used by state-of-the-art hardware packet schedulers: Push-In-Extract-Out (PIEO) maintains an ordered list of elements, but allows dequeue from arbitrary positions in the list by supporting a programmable predicate-based filtering at dequeue. PIEO supports most scheduling (work-conserving and non-work conserving) algorithms and can be implemented scalably in hardware.

SP-PIFO: Approximating Push-In First-Out Behaviors using Strict-Priority Queues Albert Gran Alcoz, Alexander Dietmueller, Laurent Vanbever — USENIX NSDI '20 (2020) Push-In First-Out (PIFO) queues are hardware primitives which enable programmable packet scheduling by allowing to perfectly reorder packets at line rate. While promising, implementing PIFO queues in hardware and at scale is not easy: only hardware designs (not implementations) exist and they can only support about 1000 flows. This paper introduces SP-PIFO, a programmable packet scheduler which closely approximates the behavior of PIFO queues using strict-priority queues-- at line rate, at scale, and on existing devices.