|
Here's an automation question for you guys. I've got a topology like this: Super simple, eBGP between two sites. I've got a single router at the hub site and two iBGP neighbors at the remote site. I'm using BGP communities attached to the routes the hub router is sending and a route-map at the remote site to increase the local preference if it matches a community. So right now by changing which neighbor I send the community from, that will dictate the path the remote side will take to get back to me. Crisp, Clean, good design. But I'm running into a problem. Let's say provider 1 is having transit issues, so I'm getting packet-loss and BFD flapping, making my through-put garbage. I want an automated way to swap my providers so that the lossy one is backup, and the other is the primary. Providers are VPLS so I've got a "direct" connection to R2a and R2b. The trivial way of doing this is simply swapping the import/export lists on each neighbor. This works fine, but there is a problem. In juniper world whenever I make that change, both BGP neighbor's reset. Not really a big deal as they come back up very quickly, but still I'd rather avoid that. So what is the elegant way of swapping these paths and making changes only on R1? ate shit on live tv fucked around with this message at 21:47 on Dec 12, 2018 |
# ? Dec 12, 2018 21:44 |
|
|
# ? Apr 25, 2024 07:13 |
|
My first thought would be route dampening to keep the lossy link from flapping when it's having trouble. The lossy link should stay down until it's recovered from whatever is causing the loss.
|
# ? Dec 12, 2018 21:56 |
|
Filthy Lucre posted:My first thought would be route dampening to keep the lossy link from flapping when it's having trouble. The lossy link should stay down until it's recovered from whatever is causing the loss. Sure, but what if we just want to toggle the provider for other reasons, maybe we flip a coin each day and make that one primary on that day.
|
# ? Dec 12, 2018 22:10 |
|
CFM and SLAX would get you there if you want a solution local to R1.
|
# ? Dec 12, 2018 22:14 |
|
Also are you saying the session resets when you change policy? Because that is broken.
|
# ? Dec 12, 2018 22:16 |
|
Not the top of my head, no. But that’s only what the switch has heard from the phone. The phone is still listening to the voice VLAN advertisement from the switch. If no traffic is coming out tagged, then something else is wrong. There is some capability to configure selective RX and TX on some platforms but I don’t know about those. Are these Cisco phones? There have been some bugs in the past regarding CDP or LLDP messaging. And, there have been some issues with the phones where they will not use the voice VLAN even though they hear it, and they get stuck on the data VLAN Until they boot up upgrade and configure them selves. That was mainly on the 7900 series in the past, in my experience. However if you still have those, you should throw them away.
|
# ? Dec 12, 2018 22:45 |
|
These are Avaya handsets and we've had LLDP issues with them before, but it's a different team that would be able to make sure their software is up-to-date. There's nothing set on the ports to restrict LLDP messages to either Tx/Rx. I think I'll get a known-good Polycom or similar shipped over and if that works I can stop looking at the switches.
|
# ? Dec 12, 2018 22:49 |
|
ate poo poo on live tv posted:Here's an automation question for you guys. I've got a topology like this: This is what pfR does. Can you run pfr or is this a vendor agnostic environment? I think juniper has their own version of pfr Sepist fucked around with this message at 22:57 on Dec 12, 2018 |
# ? Dec 12, 2018 22:55 |
|
tortilla_chip posted:Also are you saying the session resets when you change policy? Because that is broken. Yep. If you change the export statements in the configuration it resets the neighbors. If you don’t change the policy name, just the policy contents you are fine. It’s a juniperism afaik. I’m planning on using a python script that can be run to actually make the configuration change, but I want to get other people’s opinion on they would craft their policies and BGP configuration.
|
# ? Dec 13, 2018 00:37 |
|
Sepist posted:This is what pfR does. Can you run pfr or is this a vendor agnostic environment? I think juniper has their own version of pfr Yea I’ll use something like pfR to actually detect the transit issue and make the change, but I’m not sure the best way to craft the policy so that it is easy and scalable to make that change with minimal disruption.
|
# ? Dec 13, 2018 00:41 |
|
ate poo poo on live tv posted:Here's an automation question for you guys. I've got a topology like this: Are you sure it's an actual restart and not a soft reset or something? You should be able to change routing policy, regardless of how it's done, without directly impacting traffic unless there is in fact a routing change that would impact it (otherwise that'd be a nightmare). Regardless if that is indeed how it works you only need to swap the list on the failing peer, just create a third policy map and use that instead of swapping the two around, therefore only resetting the peer that's hosed anyway. Make sure you write a failsafe into your code that protects against applying it against both peers at the same time if they are both having issues.
|
# ? Dec 13, 2018 11:27 |
|
Decided to slightly rewrite my config, as well as change the spoke sites a bit. set protocols bgp group DC2 neighbor 10.148.0.34 export DEFAULT_PREF_OUT set protocols bgp group DC2 neighbor 10.148.0.34 import DEFAULT_PREF_IN set protocols bgp group DC2 neighbor 10.248.0.34 export PROVIDER2_LOW_PREF_OUT set protocols bgp group DC2 neighbor 10.248.0.34 import LOW_PREF_IN Now by default Provider1 at the spoke site will be Local-Pref 150. If either router receives a community from either provider it'll bump the local-pref to 500 So when I want to toggle from the "default" provider path to the provider 2 path I just make this change: set protocols bgp group DC2 neighbor 10.248.0.34 export PROVIDER2_HIGH_PREF_OUT set protocols bgp group DC2 neighbor 10.248.0.34 import HIGH_PREF_IN Default_pref_in prepends 2 times and sets Local Pref to 150 High_pref_in prepends 1 time and sets local pref to 250 Low_pref_in prepends 3 times and sets local pref to 50 Should be a pretty straight forward script to write.
|
# ? Dec 13, 2018 22:32 |
|
ate poo poo on live tv posted:Decided to slightly rewrite my config, as well as change the spoke sites a bit. My current preferred Junos method for bgp policies is policies that do a single thing and are named as such, then string together. Can likely achieve same thing, but depending on how many peers you have, re-using things can be good. Also, you can do math with localpref and med in Junos policies, which can be fun. code:
|
# ? Dec 13, 2018 23:41 |
|
Yea whenever I changed import/export statements names the neighbors would reset. However it didn't happen on all of them, just a few of them. I didn't do too much experimenting just because this is production and while it's not that big of a disruption I'd rather keep it to a minimum especially during our high-traffic season (right now). When I redid my policies only two of the neighbors reset, the other two were completely fine when I changed the import/export policies. All of the spoke sites have the same hardware, and have the same configuration, so vv pre:Dec 13 15:30:14 core-sw1.nj01 rpd[22617]: bgp_peer_delete:7506: NOTIFICATION sent to 10.148.0.42 (External AS 65016): code 6 (Cease) subcode 3 (Peer Unconfigured), Reason: Peer Deletion Dec 13 15:30:14 core-sw1.nj01 rpd[22617]: bgp_peer_delete:7506: NOTIFICATION sent to 10.248.0.42 (External AS 65016): code 6 (Cease) subcode 3 (Peer Unconfigured), Reason: Peer Deletion Dec 13 15:56:59 core-sw1.nj01 rpd[22617]: bgp_peer_delete:7506: NOTIFICATION sent to 10.148.0.34 (External AS 65017): code 6 (Cease) subcode 3 (Peer Unconfigured), Reason: Peer Deletion Dec 13 15:56:59 core-sw1.nj01 rpd[22617]: bgp_peer_delete:7506: NOTIFICATION sent to 10.248.0.34 (External AS 65017): code 6 (Cease) subcode 3 (Peer Unconfigured), Reason: Peer Deletion Dec 13 17:09:55 core-sw1.nj01 rpd[22617]: bgp_read_v4_message:10442: NOTIFICATION received from 10.148.0.34 (External AS 65017): code 6 (Cease) subcode 6 (Other Configuration Change) Dec 13 19:45:04 core-sw1.nj01 rpd[22617]: bgp_read_v4_message:10442: NOTIFICATION received from 10.148.0.34 (External AS 65017): code 6 (Cease) subcode 6 (Other Configuration Change) Dec 13 20:00:49 core-sw1.nj01 rpd[22617]: bgp_read_v4_message:10442: NOTIFICATION received from 10.148.0.34 (External AS 65017): code 6 (Cease) subcode 6 (Other Configuration Change) Dec 13 20:04:10 core-sw1.nj01 rpd[22617]: bgp_read_v4_message:10442: NOTIFICATION received from 10.148.0.34 (External AS 65017): code 6 (Cease) subcode 6 (Other Configuration Change) Dec 13 20:04:34 core-sw1.nj01 rpd[22617]: bgp_read_v4_message:10442: NOTIFICATION received from 10.148.0.34 (External AS 65017): code 6 (Cease) subcode 6 (Other Configuration Change)
|
# ? Dec 14, 2018 16:32 |
|
Do you have some sort of apply-groups or weird config nesting hierarchy hiding in there?
|
# ? Dec 14, 2018 17:01 |
|
I made my script work and rolled it into prod. code:
ate shit on live tv fucked around with this message at 22:25 on Dec 19, 2018 |
# ? Dec 19, 2018 22:22 |
|
Does it still reset your peer?
|
# ? Dec 19, 2018 23:23 |
|
Lols of the day: All of the switch ports here are setup as TRUNK ports. All of them. Our ARP and CAM aging intervals are set to 5 and 10 seconds. Thank you dipshits on the Ubiquit support forums for spreading that “fix”
|
# ? Dec 19, 2018 23:32 |
|
I mean... it's not the most elegant solution but for some switch vendors (FORCE10) it seems to be the only way to do both tagged + untagged traffic on a port with a non-native VLAN1??? What was the problem that needed this "fix"?
|
# ? Dec 20, 2018 00:21 |
|
ate poo poo on live tv posted:I made my script work and rolled it into prod. Netconf and Python are fun and all but couldn't this basically be two CLI commands? code:
|
# ? Dec 20, 2018 00:52 |
|
CrazyLittle posted:I mean... it's not the most elegant solution but for some switch vendors (FORCE10) it seems to be the only way to do both tagged + untagged traffic on a port with a non-native VLAN1??? The trunk ports weren’t the “fix” it was the aging. The problem was wireless clients werent able move from one AP to another and flushing the MAC address table fixes it, because the main switch doesn’t know which connected switch the client is on...which makes zero sense assuming the AP isn’t broken
|
# ? Dec 20, 2018 01:17 |
|
Bob Morales posted:The trunk ports weren’t the “fix” it was the aging. By any chance, you're not doing some hosed up implementation of zero handoff / roaming are you?
|
# ? Dec 20, 2018 02:08 |
|
CrazyLittle posted:By any chance, you're not doing some hosed up implementation of zero handoff / roaming are you? No, they just have some Ubiquiti AP’s one the same SSID. Nothing crazy but as I dig deeper into it I’m sure I’ll find more goofy poo poo. This “fix” was to fix ONE wireless pc on a forklift, all the others worked fine. How does that make any drat sense to them? It was years ago and the AP’s have been all upgraded etc since then
|
# ? Dec 20, 2018 02:21 |
|
Turn off port security bullshit then ?
|
# ? Dec 20, 2018 02:52 |
|
tortilla_chip posted:Does it still reset your peer? Oddly, no. Only difference is that script makes configuration changes via Netconf and RPC requests, so maybe that is why? falz posted:Netconf and Python are fun and all but couldn't this basically be two CLI commands? You don't really want automation scripts to login via CLI, much preferable to use the native api of the device (or even use a completely different OS like Cumulus or whatever). CLI is for humans, not scripts. That said you may be able to do the replace pattern via netconf I didn't really look. Plus I'm sure there are better ways to do literally everything, but it's an iterative process. ate shit on live tv fucked around with this message at 19:47 on Dec 20, 2018 |
# ? Dec 20, 2018 19:41 |
|
Feel like i'm being hella dumb with this one but would like someone to double check me on some basic HSRP poo poo if possible before we commit to something bad. I've got 4 Nexus 9300s in 2 vPC domains in our new data centre - 2 fibre (our 'core' switches), 2 copper (extra 'core' poo poo that doesn't use fibre/management/whatever). The vPC peer links are connected via 100gig on both pairs. The copper switches are connected to the fibre ones on 2 vPCs over 20gig each. We're getting a managed (ugh) IPVPN MPLS from our service provider. They're providing us 2 ASRs with a single 10gig port each. We're also going to have 8 VRFs on our Nexus switches and the ISP routers. They're going to give us subinterfaces presented on a .1q trunk with HSRP on each subif. Originally my bosses wanted to buy another pair of 10gig switches to run the HSRP through, so ISP Router -> 10gig switches -> Nexus. I don't want to pay for anything so i said we could just make all the layer 2 vlans on the copper switch and connect it to that. Now i'm second guessing myself and thinking is that even going to be necessary? couldn't I just patch one ASR into a fibre switch and the other ASR into the other fibre switch and the HSRP information will be exchanged over the vPC peer link? All the SVIs and VLANs are going to be on there already in their relevant VRF Contexts. Theoretically it should work but I'm doubting myself and getting thrown off here because the Nexus fibre switches are going to be doing routing and that's making me think it might mess with the HSRP info. Typically i'd just throw a blank 2960 in the middle but that's not going to work here and I don't have any spare routers lying around that I can lab this on. uhhhhahhhhohahhh fucked around with this message at 20:31 on Dec 24, 2018 |
# ? Dec 24, 2018 20:05 |
|
HSRP sends UDP hellos to the multicast address 224.0.0.2, so if there is a shared broadcast domain you'll be fine.
|
# ? Dec 24, 2018 21:48 |
|
Looking for someone who has set up a Meraki MX100 (or other MX model) who has figured out how a decent way to get alerting (emails, traps, syslog) to be sent to HP OO or the like to get slurped into tickets so we don't need to have some third world L1 guy staring at a mess of web pages. The IDS alerting issue is easy, it's syslog. But the main web console seems (?) to only send emails? Happy to be told I am wrong. My WAN Cisco guy and I stared at the console a while and fwd to syslog is obvious, but still debating the other avenues.
|
# ? Dec 26, 2018 10:16 |
|
SNMP traps are available but for some reason have to be enabled by support, that might get you close to where you want to be: https://documentation.meraki.com/zGeneral_Administration/Monitoring_and_Reporting/SNMP_Overview_and_Configuration#Defining_traps_to_be_sent
|
# ? Dec 26, 2018 12:07 |
|
Oh. Forgot. Customer has Meraki AP and FW, but Extreme Networks office LAN switches. End users connecting to O365 in office get random disconnects. VPN users do not. Amusing, but any feedback on Extreme? We'd kinda like to forklift the Extreme to Meraki but $
|
# ? Dec 26, 2018 13:29 |
|
I’d be amazed if that was a switch problem
|
# ? Dec 26, 2018 15:40 |
|
This isn't so much a configuration question as it is a navigating TAC question. I've got 2960G-48-TC-L that powers on, begins to load firmware and freezes producing no console output. The system light stays solid. I've attempted sending new firmware to the device via Xmodem through rommon, however the transfer and rommon itself freezes around 10% transferred. A fsck of the flash freezes the system on block 3. I think we may be experiencing this field notice: https://www.cisco.com/c/en/us/support/docs/field-notices/637/fn63744.html however the switch isn't under contract and is end-of-support. Being that as it may, the field notice and the symptoms I'm experiencing indicate a hardware failure and these units are covered by a limited lifetime hardware warranty for RMA replacement for 5 years after end-of-support. I attempted to open a ticket with TAC via email to receive warranty support on the device, but they refuse to speak with me beyond telling me that it's end-of-support and to purchase a new device. Is there another avenue I can try to get this thing replaced under warranty terms? I'd rather have a functioning replacement from Cisco than have to take a heat gun to the flash chips to get it to work again. Any thoughts? Edit: I would open an RMA request with the original vendor, however they went out of business a few years back, leaving Cisco my only option. Back of the Bus fucked around with this message at 19:22 on Dec 31, 2018 |
# ? Dec 31, 2018 19:17 |
|
The device clearly has a lifetime hardware warranty (https://connectthedots.cisco.com/connectdots/serviceWarrantyFinderRequest) which is detailed here https://www.cisco.com/c/en/us/products/warranties/warranty-doc-c99-740619.html#_Toc513754120, but it also says "Cisco hardware warranty support will be discontinued on the Last Date of Support (LDoS) published in the product End of Life Announcement", and that was in 2017 https://www.cisco.com/c/en/us/products/collateral/switches/catalyst-2960-series-switches/end_of_life_c51-674040.html
|
# ? Dec 31, 2018 19:41 |
|
Yeah, I just found https://www.cisco.com/web/AP/partners/disti/files/limited_lifetime_warranty_qa.pdf which says: What are the terms and limitations of the LLW? A. Cisco will warranty the Catalyst switch for as long as the original end user continues to own or use the product. In the event that manufacture of the product is discontinued, Cisco warranty support is limited to five (5) years from the announcement of discontinuance. Please note that the fan and power supply warranty is limited to five (5) years from date of purchase. I thought it was 5 years from end of support. They definitely announced end-of-life in 2012, so 2017 is right for the end of hardware support. That sucks, I guess heat gun it is.
|
# ? Dec 31, 2018 20:09 |
|
It’s not like a Xbox, the fault isn’t bad solder joints. The flash is bad so unless you replace it, you’re not going to get anywhere with that. Drives me nuts these things are still more than $10 on eBay knowing that they’re time bombs.
|
# ? Dec 31, 2018 23:07 |
|
At least if it was just the flash you could TFTP boot the OS and a config to it
|
# ? Dec 31, 2018 23:17 |
|
I managed to get an IOS image xmodem'd onto the flash a few hours ago, booted and it froze just after showingcode:
Edit: It could also be a power issue, but I just replaced the PSU a few weeks back so I'd think that would rule it out. I'll check the voltages once I find my multimeter, though. Back of the Bus fucked around with this message at 23:36 on Dec 31, 2018 |
# ? Dec 31, 2018 23:32 |
|
The Cisco advisory says the issue is memory, and I doubt you'd get as far as you do if it were a CPU issue. There's nothing wrong with giving a new memory chip a try, if you have the soldering gear to do it.
|
# ? Dec 31, 2018 23:35 |
|
Haven't worked with Cisco in a while, could someone have configured the console port to be something other than 9600-n-8-1?
|
# ? Dec 31, 2018 23:45 |
|
|
# ? Apr 25, 2024 07:13 |
|
I configured it in rommon to 115200-8-n-1 but it's not even getting far enough to load a config that would change the baud rate. It usually freezes while loading the IOS, the fact it got all the way through loading the image earlier is miraculous, even if it still froze afterward.
Back of the Bus fucked around with this message at 00:44 on Jan 1, 2019 |
# ? Jan 1, 2019 00:38 |