Everybody Oversubscribes

Oversubscription in a network is a state where there is not enough bandwidth at one or more devices (or ports) in the network to support all of the other ports (or devices) using their full bandwidth. Sometimes, it is defined more narrowly to only consider uplinks, or expected traffic flows.

Oversubscription is poorly understood outside of networking engineering circles, and tends to be fairly controversial when ISP customers learn their ISP has oversubscription in their network. People love to hate their ISP, and oversubscription feels like one more scam ISPs pull to deliver a sub-par experience.

This belief is completely misplaced. Don’t get me wrong: your ISP sucks. I don’t know who they are, but I feel pretty confident in making that statement. They almost certainly are ripping you off some how. They probably don’t manage their network very well, especially for all the money they are raking in. But, oversubscription isn’t “bad” per se¹.

Even simple networks, simple as in “a single Ethernet switch with a couple of hosts connected” are oversubscribed:

                   .--------.
                   | Host B |
                   .---+----.
                       | 1gbps
.--------.         .---+----.          .--------.
| Host A +---------+ Switch +----------+ Host C |
.--------.  1gbps  .---+----.  1gbps   .--------.
                       | 1gbps
                   .---+----.
                   | Host D |
                   .--------.

Description: A simple Ethernet network with four hosts connected to a switch

Assuming that the switch is a non-blocking crossbar, every port has a non-blocking line-rate (here, 1gbps) path to every other port. This simple scenario could suffer from up to 3:1 oversubscription: if hosts A, B, and C all try to send 1gbps to host D, the buffers on the switch for port D will fill up and overflow. There are only two ways to resolve this problem: The first is to do a full mesh connection directly between every node in the network with no intermediate devices of any kind. The second, is to do some traffic engineering, analyse our expected traffic flows, and upgrade one or more of the connections to the hosts that are sending (and/or receiving) more traffic than the others.

In the real world, there will probably be a small number of hosts in this hypothetical network that expect to send and/or receive much more traffic than the others: a server, for example. If we add a constraint to the mix that we aren’t concerned with oversubcription amongst host-to-host links, but only from hosts to our uplink or server(s), then in this simple example, we can practically speaking eliminate our oversubscription: we just increase the bandwidth to our server/uplink, say by moving it to a 10gbps port.

That’s traffic engineering, and is how oversubscription gets managed in real-world networks. Expressed in ISP terms, instead of a “server”, it is probably the network uplink that needs to support more bandwidth than the other ports. With that one constraint², we can at least try in our quest to design an “oversubscription-free” ISP network.

Topology-wise, consumer ISP networks all have basically the same problem to solve: connect your subscribers via one or more “last mile” technologies (fiber, cable modem, DSL, etc.) to “the Internet.” The exact details will of course vary, depending on technology decisions that ISPs make, but the basics are all the same:

Terminate and aggregate subscriber connections on some last-mile-specific device, like a CMTS (cable modem,) DSLAM (DSL,) etc.
Aggregate uplinks from those devices to your “core”
Connect your core to other ISPs

We can construct a “oversubscription-free” node design for point #1. In fact, with our constraint that we don’t care about subscriber-to-subscriber traffic, we already have. In ISP terms:

  .---------.
  | Uplink  |
  .----+----.
       | 100gbps
.------+-----.
|   CMTS     |
.-+-+-+----+-.
  | | |    | 1gbps per subscriber
  A B C .. N

Description: A cable modem termination system support 100 1gbps subscribers, uplinked at 100gbps.

Here, we can support up to 100 subscribers without oversubscription as defined here.

But, we need more than 100 subscriber to make our ISP work. Let’s build for 10,000 subscribers³. Collapsing this as much as possible, you’d need a core router that could support 100 100gbps ports to connect to our CMTS devices. That’s technically achievable, if economically unrealistic, today.

But our ISP’s upstream situation is much worse. Let’s say our small ISP has upstream connections to two other providers to connect to the rest of the Internet. To be “oversubscription-free”, our provider would need to connect to each of their upstreams at the combined subscriber line rate from our example (10,000 * 1 gbps = 10 tbps.) If we double our subscriber count (still tiny at 20,000 subscribers,) we need 20tbps connections to each of our upstream providers. After all, it is conceivable that all of the ISPs subscribers are trying to talk to something hosted at just one other ISP all at the same time.

As you scale this up to real-world subscriber counts and go further in the network - and the math only gets more ridiculous the higher you scale and the deeper in the network you go⁴ - you end up back at this fundamental truth: the only way to completely avoid oversubscription is to directly connect every host in a network to every other host in the network without any intermediate network equipment. On a very small LAN, you can do that. On the public Internet, you cannot. There will be oversubscription somewhere. Likely, it will exist almost everywhere.

While oversubscription is not bad or a sign of network mismanagement, port oversaturation is. That’s what responsible try to control, by monitoring their uplinks, their core network, and their peering/transit connections, and by upgrading them (or steering traffic to another, less congested location,) as needed. If they do that correctly, you’ll never notice the oversubscription that exists in the network.

the level of oversubscription at your ISP might be a problem, though. ↩︎
note that even here, this constraint is pretty silly. While you might not care about sending traffic to your neighbor who is connected to the same last-mile concentrator as you, if your ISP is big, there probably is something they provide connectivity to that you do care about talking to. ↩︎
still quite small for a consumer ISP. ↩︎
let’s say our small ISP connects to a “tier 1” provider at the required 10tbps. A moderate-sized municipal fiber network might have over 100,000 subscribers. They would need to connect at 100tbps. Big consumer ISPs have millions of subscribers. At 1 million 1 gbps subscribers, they would need to connect to their upstreams at 1 petabit per second. ↩︎