Redefining Active-Active Network Connections

Network device and data center connections often described as “active-active” might be better termed “psuedo active-active.”

Have you noticed that “active-active” has gotten over-used in networking? Not to mention, used inappropriately?

The industry could use a little more accuracy in terminology, and perhaps candor, around the use of “active-active.”

My suggestion: The term “pseudo active-active” better describes a lot of situations. (If you have a better term for what I describe below, please suggest it in a comment!)

When a firewall pair, load balancer pair, etc., operates with one of the pair active, the other passive or in a standby state, that’s active-passive. Fine, I think we all can agree on that terminology.

Several vendors let you run firewall A for some traffic and firewall B of the pair for the remaining traffic, or routing VRFs, etc. Feel free to substitute “load balancer” or other type of appliance for “firewall.” Yes, they’re both active in a sense — but not for all traffic!

Could we please call that situation “pseudo active-active”? Not to slam the concept — it may be exactly what you’re looking for — but to make the point that it is not two fully active devices, it is two devices each carrying a partial load. One reason for making this distinction is that pseudo active-active (“PAA”?) creates complexity, dividing up the traffic (routing, VRFs, etc.) to achieve that traffic partition.

By way of contrast, consider e.g. ASA firewall clusters and similar solutions from other vendors. All the members are actively forwarding, with load distributed across them. That’s really active-active (times N). One hopes that they’re not all front-ended by a device that should be known as the “front-end bottleneck.”

With any such solution, the next few questions are around efficiency of the load-sharing, asymmetric paths, etc. There’s usually some overhead (and sometimes a bottleneck) in distributing the workload, ensuring symmetric flows, etc. There’s also the matter of shared fate: does the whole cluster fail if the cabling or something else fails? What about if the cluster “lead” or “primary” device fails?

But at least we’re clear about what the cluster is doing: both or all members actively processing all traffic, shared out across them. No manual partitioning.

So, what do we do with VMware Layer 2 Data Center Interconnect (DCI) in some form, where VMware marketing is calling that “active-active” data centers? Sounds great, doesn’t it? There could be convenience to being able to move workloads back and forth, with appropriate backing store handling. The wisdom of doing so is an interesting debate topic, but off-topic for the point I’m trying to make in this blog.

I’ve seen some single-datacenter STP/BUM traffic meltdowns, and a couple of dual datacenter DCI disasters — very ugly and stressful for all involved! That’s an experience that’s best avoided.

Could we please also call what VMware does for datacenters pseudo active-active? Yes, both datacenters are active — for part of the application workload, just like pseudo active-active on firewalls.

In Layer 2 DCI, the IP address does not indicate location, to a location. That can create major networking complexity, with significant impact on application performance for multi-tier applications. I just had a discussion with someone about this: There is a slippery path of piling on complexity layers, each addressing some problem created by the prior “solution.” The details are a side-track, see some of my prior blogs.

In any case, L2 DCI joins datacenters as far as possible STP meltdowns. From a VMware perspective, the L2 DCI is a fragile backplane in the middle of however you’re clustering controllers. The simple way to put it is that in terms of risk, you’ve merged two datacenters into one less robust datacenter, somewhat resembling the older approach of splitting redundant devices and VLANs across two adjacent buildings. “Conjoined datacenters” perhaps?

That aside, my terminological concern here is that we need to differentiate datacenter pseudo active-active from true fully dual active datacenters or cloud instances. That is, datacenters or cloud with dual or N-fold active applications. Think “scale out microprocesses.” This is particularly important as more and more applications get shifted to the cloud.

A well-designed application that is running actively out of several sites is (one hopes) a lot more robust than one that detects failure of its primary copy and boots up or wakes up a secondary copy, i.e. VMware SRM-like behavior. The same applies to transferring possibly corrupted state to a warm standby copy. An active-active web/cloud application has multiple front ends, with load balancing or other mechanisms for shunting traffic to a functioning copy.

Granted, re-writing an application to operate that way may be costly and complicated, and take years, whereas VMware SRM or comparable forms of HA may provide adequate resiliency for your needs.

Consultants’ standard answer: “it depends.”

I do regard multi-site active applications as the end game for Internet and cloud-based applications attempting high availability.

And that’s why I want us to be clear about terminology, so we all are clear about what our technology choices do and don’t do. Let’s reserve “active-active” for situations where both devices (or datacenters) are actually active on a non-partitioned workload. And pseudo active-active for situations where the workload is partitioned, with neither device or data center running everything.

If you’ve got a better word for pseudo active-active, please let me know. I thought about “phony” instead of “pseudo,” but that’s a bit strong/offensive. “Pseudo” better fits the main point here, namely situations that are “sort of” active-active.

This article originally appeared on the NetCraftsmen blog.

Redefining Active-Active Network Connections

Leave a Reply Cancel reply

Categories

Subscribe Our Newsletter