Let’s say you’re a consultant working on a couple of internet edge design projects.
In the first scenario, you are designing an internet connection for a factory.
- There are a few hundred workers who access AWS using the internet-as-WAN for critical apps related to factory operations.
- The factory is automated, and metrics related to production line health and performance are analyzed in AWS.
- There is an IoT network used for physical security that relies on an internet-based SaaS product to run reports and distribute alerts.
- A group of executives have offices at one end of the factory. Because of the pandemic, they don’t use them right now, but they do remotely access workstations with highly sensitive data that reside in those offices.
In the second scenario, you are designing an internet connection for an executive’s home.
- The executive has been working from home since the pandemic started, and finds the internet connection is unreliable for video calls. The video lags and gets pixelated. There are audio dropouts and audible jitter.
- The executive’s family members are also demanding internet users. The kids are in Zoom school. The spouse has a digital editing business and shares large files with clients.
In the factory scenario, you would identify solid internet connectivity as a mission critical component required to keep the factory running. The executive team even knows how much money they lose per hour if the production line stops. That makes budgeting and ROI calculations easy.
You’d design a robust internet connectivity architecture for the factory.
You’d specify at least two feeds coming into the facility–one aerial, plus one underground in armored conduit. Perhaps you’d use a third feed that’s wireless, maybe even using licensed spectrum. To reduce fate-sharing, you’d make certain that the diverse cables coming into the building took separate routes to the central office, and you’d choose providers that don’t have common backbone infrastructure.
You’d make sure each link had more bandwidth than required, enough to handle 2 of the 3 pipes going down. Since the public cloud processing was all done in AWS, maybe you’d plumb in a Direct Connect link to AWS, if you could get one. Although…that would complicate the routing scheme a bit more. Still, a DX circuit would take off a little AWS processing latency and might be worth the headache.
For the rest, you’d load balance across links by taking in full BGP routing tables, necessitating a bit of router heft. Not crazy big routers, though. But big enough. Or maybe actual hardware load balancers? There’s a big budget, after all. Wait…should SD-WAN come into play? Or one of those new cloud network exchanges you’ve been hearing about?
What about network equipment vendor diversity? Should you get one router from Vendor C, one from Vendor J, and one from Vendor A to avoid fate-sharing in case a NOS has a nasty bug, because they all seem to these days? Maybe.
I could go on. We haven’t talked about NAT, transport security, and the remote VPN solution those mid-level executives need that must be locked down because of that sensitive data they are accessing on those boxes in their pre-pandemic offices. And what about DDoS mitigation? Is the factory a target for such an attack? Can the upstream providers help keep the link up if a DDoS event happens?
I’m not trying to answer all of the questions in this hypothetical scenario. Rather, I’m pointing out something that I hope is plain. This internet edge design is complex. Is it overly complex? For purposes of this discussion, let’s say it isn’t. Let’s say the implied level of engineering I worked through is warranted. You install it all. You test a bunch of failure scenarios. You’re satisfied that the system, complex though it is, is working as designed. Your factory customer is happy. All is well. Good job, everyone.
Let’s shift our attention to the executive’s home.
You just finished up a successful factory internet edge design that worked out, right? So…use that same design in this residential scenario? After all, it’s an executive’s home, and executives are very important people.
Obviously (I hope, else my ridiculous example was pointless), applying the factory design to a residential scenario would be overkill. The factory solution, even if someone signed off on the cost, would be too complex to implement. In this case, simple would do. I’d do two things to address the issues the exec was facing when working from home.
One, I’d recommend a second internet line that the exec would have dedicated to them–not shared with the rest of the family. Two, I’d take a good, hard look at the wireless network in the house, probably get a little frightened, and nope right out of it by hard wiring the executive’s laptop to a switch if possible. And if not possible, I’d beef up the wireless network until it didn’t scare me anymore. That’s it. That would get the job done with a high probability of success, a simple enough design, and a reasonable cost.
This isn’t a post about internet connectivity design.
This is a post about how we IT engineers tend to overcomplicate our designs. We love shiny things. Sometimes, we work with technologies not because our company needs them, but because we want to put new skills on our resumes. Sometimes, when faced with a problem, we’ll default to how we solved it last time, rather than analyzing the new situation to see what’s most appropriate.
At times, we overestimate the impact of a system going down, and engineer excessive complexity into the solution to keep that system up. We care more than the business does at times, because our switches and our servers and our cloudy constructs become pets we cherish and teach tricks to so we can show off to others.
Our designs become laws. This is the way.
We love our nerdery, but sometimes our nerdery is a bad match for the situation. I’ve seen scenarios in multiple shops where the same levels of redundancy and capacity were applied to, say, every remote office. Heck, I’ve written up those BOMs myself. To be clear, I’m not arguing against homogeneity per se. There’s operational comfort and supportability that comes with “same-same” where you can.
I am suggesting that some network designs are needlessly complex because some engineer fell in love with a protocol, technique, or nerd knob. Somewhere, the nerd knob became a standard, and now, 5 years later, no one remembers why the nerd knob is set like that. The network is now more brittle. Every change tiptoes around the poorly understood knob. Eliminate the nerd knob? Of course, except that no one knows what will happen, and so device configurations are treated with fear. That is, until someone takes the time to figure it out and engineer it out of the network.
If you can’t clearly answer the question, “What problem does this solve for me?” you shouldn’t be using the feature. You shouldn’t be adding the complexity. Even if you can clearly answer that question, you need to weigh the benefit against the fragility–the brittleness–you are adding. Not all benefits are worthwhile.