Infrastructure growth in the networking industry has outpaced our ability to adequately understand, manage, and diagnose issues across various elements in the system. Networks are now so complex that error-free configuration and operation, even with human intervention, is a daunting task. This challenge comes at a time when the increased availability of compute, storage, and network capacity enables large amounts of data to be gathered from various sources.
So, it logically follows that, using these resources, it is feasible to understand network behavior using machines to learn and infer from historical data.
The 2019 Cisco Visual Network Index (VNI) Forecast, which tracks global IP traffic growth for mobile and fixed networks, predicts 4.8 billion Internet users using 28.5 billion networked devices and connections — of which 82 percent of the IP traffic consists of video — up to the year 2022. The mobile sector drives this growth exponentially, especially with the 5G rollout, which has a projection of 5.7 billion users, with 12.3 billion mobile devices and connections, contributing 930 exabytes of data traffic.
Given this massive amount of growth across all verticals we serve, the need to understand and manage networks at such scales becomes critical.
To date, Cradlepoint has more than 1 million routers deployed across all markets, with the ability to manage and orchestrate our services in the AWS cloud. As we are deployed in a unique position in the network, our routers see both traffic flowing in the uplink from connected devices on the LAN and downlink via the WAN from the core network. This allows us to use the data these devices see constantly to deploy advanced machine learning techniques, and to give customers valuable insights.
Our small and medium enterprise customers use routers with a cellular WAN interface as a backup when the primary wired connection fails. Increasingly, cellular WAN is used as a primary connection where it proves more reliable and offers better economies of scale than cable or fiber broadband. In such cases, however, the predominant concern is one of monitoring pooled data plan usage, as LTE data plans tend to be capped.
Cradlepoint previously provided our customers with a visual dashboard indicating their cellular plan data utilization with user configurable thresholds for alerts plus a forecasting scheme to indicate a trend, using simple linear extrapolation (Fig. 1).
Fig. 1: This figure shows an original linear trend forecast based on simple extrapolation.
However, this scheme did not account for real-life user behavior, i.e.:
a. Periodic patterns of usage due to seasonality, or
b. Long-term trends, or
c. Anomalous patterns of usage
To address this, Cradlepoint’s network intelligence initiative starts with a statistical machine learning inference scheme based on a Holt-Winters triple exponential smoothing time series forecasting technique, which can account for:
1. A linear trend
2. Periodic variations due to seasonal factors
3. Residual variations
Based on this model, our customers can now see a more accurate and realistic forecast of their data plan usage on the same dashboard. A sample forecast plot over a typical customer billing cycle for a major carrier network in the U.S. is shown in Fig. 2 below:
Fig. 2: This graph shows historical vs. forecast data plan usage for a typical customer on a major US cellular carrier network.
The forecast chart for this customer clearly shows a periodicity in the usage pattern, as would be expected in a small and medium enterprise branch office with a Monday through Friday, 8 a.m. to 5 p.m. schedule, with sparse data utilization on weekends. Such forecast information can be used to optimize consumption of a pooled data plan and further mobile data capacity planning distribution across various branch locations. Also, any spikes or departures from regular patterns such as above would indicate anomalies, bearing further investigation.
We also provide confidence intervals around our forecast, with a “worst-case” and a “best-case” band indicated on the same chart. This is based on statistical methods and provides a threshold of reliability which customers can use to decide whether to take any action or not. For example, if the worst-case chart is likely to exceed a pre-set threshold with greater than 50 percent probability, the customer can use this information to proactively throttle traffic on that cellular WAN interface before it hits the threshold. The current confidence interval we see with our implementation is at 70 percent. Fig. 3 shows a typical customer dashboard with this confidence interval:
Fig. 3: This chart shows the data plan usage forecast over a billing cycle of a typical customer on a major U.S. cellular network and with upper and lower bounds of a 70 percent confidence interval.
This enables customers to gain deeper insights into their data plan utilization and possibly take suitable actions.
A typical use case scenario was with a customer whose endpoint devices that were connected to our routers were rapidly connecting and disconnecting over a cellular link causing a spike in data utilization. This eventually led to overages up to $10,000 in a month on their subscribed data plan, which was not set up to anticipate such events. With this more sophisticated prediction capability, such anomalies can be detected in advance, saving them the high cost of data plan overages.