Stretching layer 2 between geographically distant data centers + General DR

The Something Awful Forums > Discussion > Serious Hardware/Software Crap > Stretching layer 2 between geographically distant data centers + General DR

SSH IT ZOMBIE: Apr 19, 2003; No more blinkies! Yay!; College Slice

The end goal is to have an IP move between data centers as painlessly as possible, when needed.

Out of curiosity, what are some of you folks doing for disaster recovery\high availability between data centers?

We're a hospital system and are kind of rapidly expanding. Went from 1, to 2, to more and more data center sites. None of them have stretched SAN fabric nor stretched network layer 2 with our main site. I'm on the systems engineering side, my ex-boss used to tango with the networking manager on stretching vlans across sites but ultimately it never happened.

Should it? It seems like a flat layer 2 network isn't a good idea across multiple sites, but I'm not in networking. My boss left, and the new network manager and my new manager are actually pushing his team to stretch layer 2. I started working with one of the engineers on an alternate solution on the side.

Most application servers can function via DHCP assigned IPs fine, many of ours do, and use DNS for client connectivity. Many support active-active nodes, IMO that's ideal. There are special cases out there where endpoints MUST point at an IP, don't support DNS etc.

We want to separate one of our critical systems between two data centers, it's one of those special cases that can't use DNS.

One of our networking engineers had a neat solution - use OSPF at two sites, advertise where the route for the production ip lives. If we need to do a failover, dedicate a small subnet to the application and fail that over via layer 3 instead of stretching vlans.

We've been testing it, it's actually an application that runs on AIX, we're using gated. It actually seems to work really, really well.

We might also look at our Citrix Netscalers, they have similar functionality. Or maybe other appliances. Or, I might just ask networking - why not fail over the subnets via the routers directly? We should be able to write scripts to do that.

What are other people doing?

# ? Sep 4, 2015 00:59

Adbot: ADBOT LOVES YOU

# ? Apr 26, 2024 05:48

FatCow: Apr 22, 2002; I MAP THE FUCK OUT OF PEOPLE

SSH IT ZOMBIE posted:

One of our networking engineers had a neat solution - use OSPF at two sites, advertise where the route for the production ip lives. If we need to do a failover, dedicate a small subnet to the application and fail that over via layer 3 instead of stretching vlans.

We've been testing it, it's actually an application that runs on AIX, we're using gated. It actually seems to work really, really well.

We might also look at our Citrix Netscalers, they have similar functionality. Or maybe other appliances. Or, I might just ask networking - why not fail over the subnets via the routers directly? We should be able to write scripts to do that.

What are other people doing?

If you must stretch layer 2 in 2015 YOOL use something like VXLAN/OTV instead of stretching vlans.

# ? Sep 4, 2015 04:00

adorai: Nov 2, 2002; 10/27/04 Never forget; Grimey Drawer

vxlan will give you this functionality, but don't. If you have to keep the same ip but move datacenters for DR or something, just use small subnets and virtual routers.

# ? Sep 4, 2015 04:12

Prescription Combs: Apr 20, 2005; 6

What crappy application has to be hard coded at IP addresses?

Sucks you can't do F5 Big-IP GTMs.

# ? Sep 7, 2015 22:35

devicenull: May 30, 2007; Grimey Drawer

SSH IT ZOMBIE posted:

One of our networking engineers had a neat solution - use OSPF at two sites, advertise where the route for the production ip lives. If we need to do a failover, dedicate a small subnet to the application and fail that over via layer 3 instead of stretching vlans.

We've been testing it, it's actually an application that runs on AIX, we're using gated. It actually seems to work really, really well.

This. Do not try to do this with layer 2 bridges. If you can do this via layer 3, it's going to be significantly more reliable. We usually do this via BGP (because we use BGP for routing), but it's pretty much the same thing via OSPF

# ? Sep 7, 2015 22:35

Antillie: Mar 14, 2015

Prescription Combs posted:

What crappy application has to be hard coded at IP addresses?

Sucks you can't do F5 Big-IP GTMs.

This. Or just put the IP on an F5 Big-IP LTM and let it figure out where to forward the traffic to based on keep alive checks or irule logic.

# ? Sep 7, 2015 23:20

adorai: Nov 2, 2002; 10/27/04 Never forget; Grimey Drawer

Prescription Combs posted:

What crappy application has to be hard coded at IP addresses?

Most commonly it's more about the lovely application admin than the application itself. See: mainframe operators

# ? Sep 8, 2015 04:02

SSH IT ZOMBIE: Apr 19, 2003; No more blinkies! Yay!; College Slice

adorai posted:

Most commonly it's more about the lovely application admin than the application itself. See: mainframe operators

In this case there are lab devices that connect to a healthcare system. The administrators swear up and down an IP is required for the devices. I guess I could ask them to show me, though I believe it, given how limited the computing power is on some devices, they might not implement DNS.

# ? Sep 9, 2015 00:29

SSH IT ZOMBIE: Apr 19, 2003; No more blinkies! Yay!; College Slice

Antillie posted:

This. Or just put the IP on an F5 Big-IP LTM and let it figure out where to forward the traffic to based on keep alive checks or irule logic.

We thought about that, we have Citrix Netscalers which are a competing product. They have Virtual servers, but you run into the same issue, the virtual server IP has to be able to work at either data center, and you need the HW LBs at two different sites for redundancy.

Netscaler ALSO supports BGP\OSPF and acting as a router, but we need additional licensing.

# ? Sep 9, 2015 00:31

Prescription Combs: Apr 20, 2005; 6

Could you do it with anycast routing? Not sure what protocol the devices need to communicate on, though.

# ? Sep 10, 2015 00:20

adorai: Nov 2, 2002; 10/27/04 Never forget; Grimey Drawer

SSH IT ZOMBIE posted:

In this case there are lab devices that connect to a healthcare system. The administrators swear up and down an IP is required for the devices. I guess I could ask them to show me, though I believe it, given how limited the computing power is on some devices, they might not implement DNS.

Why do you need to stretch layer two then?

# ? Sep 10, 2015 01:05

abigserve: Sep 13, 2009; this is a better avatar than what I had before

Depending how far your data centers are apart you could do something like VSS your core routers. We did that at my last place of work which had two DC's 8k's apart and it worked fine. VSS handles local traffic routing so you don't get the usual layer 2 issue of the active gateway being at the other data center.

VXLAN/OTV/TRILL all look really good on paper but I've not yet come across anyone who has implemented any in a production network.

# ? Sep 10, 2015 02:54

luminalflux: May 27, 2005

Prescription Combs posted:

Could you do it with anycast routing? Not sure what protocol the devices need to communicate on, though.

OSPF or BGP. We do it with OSPF internally.

# ? Sep 10, 2015 04:37

1000101: May 14, 2003; BIRTHDAY BIRTHDAY BIRTHDAY BIRTHDAY BIRTHDAY BIRTHDAY FRUITCAKE!

SSH IT ZOMBIE posted:

In this case there are lab devices that connect to a healthcare system. The administrators swear up and down an IP is required for the devices. I guess I could ask them to show me, though I believe it, given how limited the computing power is on some devices, they might not implement DNS.

It's possible the devices get their config from DHCP options in which case you won't need to do anything terrible like a full on layer 2 stretch.

quote:

VXLAN/OTV/TRILL all look really good on paper but I've not yet come across anyone who has implemented any in a production network.

Trill isn't really meant as a data center interconnect but more a means of bundling links together to build a loop free ethernet fabric. It's more comparable with SPD or Cisco's Fabricpath than it is OTV or even VXLAN.

VXLAN we use pretty often but almost never for stretching over metro/geo distances. Just over l3 boundaries within a given data center.

OTV I've used only as a migration tool since even with first hop localization it's still a crapshoot with how traffic gets to the server you've vmotioned over to a remote data center.

If you're absolutely stuck with the requirement of having the same IP networks in multiple locations I would probably put them in their own vrf on my DR router then just advertise those networks out via BGP/OSPF/whatever when my primary site turns into a smoking hole. This has the added benefit of letting you do pretty comprehensive testing of your DR plan without worrying about traffic "leaking" into the wild. I've done this for a number of customers now ranging from the 100ish servers to 10,000+.

# ? Sep 13, 2015 22:26

SSH IT ZOMBIE: Apr 19, 2003; No more blinkies! Yay!; College Slice

adorai posted:

Why do you need to stretch layer two then?

It's for if and when we fail over between active and passive nodes for the app. One node they use for read only and reporting. Oracle mirrors between the nodes. The other node the clients and lab devices connect to, many via IP instead of DNS.

During failover, the vendor expects to be able to move the service IP between systems. This you need stretched layer 2, OR you fail over the routes to the other data center or use another technology.

Certain people where I work want to stretch layer 2 and we looking and pushing for alternatives.

Right now both nodes are in the same data center so it's a non issue. I'm trying to push for us to separate them to two different data centers.

# ? Sep 14, 2015 18:15

r u ready to WALK: Sep 29, 2001

I'm pretty disappointed that there still isn't a clean, easy standard way of stretching a subnet across locations without requiring a big dumb layer 2 network in 2015

We're having the same argument at my workplace because the hospitals we're serving used to have two local server rooms with stretched layer 2 networks, disk mirroring and stretched clusters.

The higher-ups want to consolidate all the local crap into our two large regional datacenters but in our regional network infrastructure there is no stretched layer 2 and even with OTV the network team hasn't figured out how to transparently move the gateway, so if our primary site fails the stuff that fails over a static IP to the secondary won't reach their gateway

Supposedly it's a lot easier to set up a dedicated VRF with stretched layer 2 on shortest path bridging in an Avaya network, but we're stuck with a Cisco core for now.

I like the idea of virtual routers that can fail over along with the rest, would love to hear more about that.

# ? Sep 19, 2015 15:02

Adbot: ADBOT LOVES YOU

# ? Apr 26, 2024 05:48

adorai: Nov 2, 2002; 10/27/04 Never forget; Grimey Drawer

OK it's pretty simple. My core network at both datacenters is built almost entirely with VyOS as the routing. It caps out at about 3Gbps per instance, which is more than we need for layer 3 anywhere. If you need more than that as your absolute core, you could replace the hub in my below description with a layer 3 switch or something.

At each datacenter I have a hub instance. Lets refer to them as router00 and router10. In both datacenters these routers have a common vlan, i use 101 but it doesn't matter. It's not stretched, just local to that datacenter. They also have all of my metro ethernet links attached to them. These are routers that will never have to migrate because all of the equipment attached to them is physical poo poo that is only available in their own datacenter. They run a dhcp server on vlan 101 and each has a different layer 3 network (10.0.0.1/26 and .64/26 in my case). In each datacenter I have a number of other virtual routers that have some number of additional vifs on them, but all have vlan 101 and get their addresses from DHCP. The routers all exchange routes via ospf.

Now if I know application A might need to float to datacenter 2, I just give it it's own virtual router instance, and that virtual router moves with the application VMs. It will automatically attached to my vlan 101 in the other datacenter, get a proper IP address for that subnet, and begin advertising the necessary route. I never have to touch a guest in order to make this magic happen. In the instance where we have a device in each datacenter that need the same IP address, I can do that with a nat statement in a router that fails over. My goal was to make it so that if my team was dead, no one would need to know about our network -- they only need to worry about the VMs, everything else should just happen.

edit: a key component to this is a highly segmented server network. Most applications have 1 VM, so I generally provision a /29 for them, that way we could double the server count AND perform a new server upgrade without needing a new subnet.

adorai fucked around with this message at 15:16 on Sep 19, 2015

# ? Sep 19, 2015 15:13

The Something Awful Forums > Discussion > Serious Hardware/Software Crap > Stretching layer 2 between geographically distant data centers + General DR