Cloud Computing: Mostly fog machines in other people's datacenters

The Something Awful Forums > Discussion > Serious Hardware/Software Crap > Cloud Computing: Mostly fog machines in other people's datacenters

«‹›10 »

chutwig: May 28, 2001; BURLAP SATCHEL OF CRACKERJACKS

jaegerx posted:

Coreos has openstack running in containers where it should be considering how often poo poo fails. You should look into that. Or just run k8s on top of coreos.

I don't actually have a problem with OpenStack as far as stability goes, on either Kilo or Liberty. What I have a problem with is the huge split between the people who write OpenStack and the people who run OpenStack clusters. The developers like to crank out releases every 6 months but most operators are running releases that are 2-3 versions behind, so many clusters you find in production are considered EOL by the developers. I started building Kilo clusters at Bloomberg about 18 months ago. I just finished upgrading them all to Liberty last month, and soon I will start upgrading them all to Mitaka. Liberty is considered EOL by the developers. I find this stupid and wish that they'd move to a yearly release cadence and use the extra time for testing.

OpenStack also has a lot of weird and stupid scaling bugs that don't show up in testing because developers write stuff that works on devstack, passes the gates, but then falls over in a cluster of any real size. Got a cluster with many hundreds of Ceph OSDs? Watch as your nova-compute processes mysteriously crash! Watch as Nova and Cinder suffer from constant race conditions! Watch as RabbitMQ gets bogged down in millions of bullshit messages from Neutron agents!

# ? Mar 18, 2017 03:25

Adbot: ADBOT LOVES YOU

# ? Apr 26, 2024 03:43

Docjowles: Apr 9, 2009

As you and VC have repeatedly posted, whenever there is an OpenStack perf problem or general "this should never, ever, ever happen WTF?" issue, look at RabbitMQ and Galera. One of those two is almost certainly ruining your life, despite you having deployed the recommended config.

# ? Mar 18, 2017 03:40

Docjowles: Apr 9, 2009

Also I totally agree on Ops being a pitiful afterthought. I went to the Atlanta summit (thanks, Cisco!) a few years ago. There was one loving session out of the whole 5 days dedicated to operators. It was in a room packed nuts-to-butts. The poor host asked "ok who has a comment on the operability of OpenStack?" and then a scene from the Walking Dead ensued and suddenly it was an hour later and everyone was dead. The other 99.5% of the conference was marketing boondogles and/or companies trying to hire anyone who could spell OpenStack.

# ? Mar 18, 2017 03:47

jaegerx: Sep 10, 2012; Maybe this post will get me on your ignore list!

I've still got lots of friends at Rackspace running private openstack instances. I feel their pain. Most of the customers do like it though.

# ? Mar 18, 2017 04:14

Ashex: Jun 25, 2007; These pipes are cleeeean!!!

If people are on AWS take a serious look at using their managed Elasticsearch for building a ELK stack as there's a logstash agent plugin for pushing there.

And there's things I could say about the service to make it more attractive but I can't yet.

# ? Mar 20, 2017 14:03

Vulture Culture: Jul 14, 2003; I was never enjoying it. I only eat it for the nutrients.

Ashex posted:

If people are on AWS take a serious look at using their managed Elasticsearch for building a ELK stack as there's a logstash agent plugin for pushing there.

And there's things I could say about the service to make it more attractive but I can't yet.

Assuming you're related to this product and not just evangelizing a thing you like: if I'm looking to pay someone to manage 1/3 of my logging stack, why would I do that instead of going all-in for a hosted offering like Loggly?

# ? Mar 20, 2017 14:31

Ashex: Jun 25, 2007; These pipes are cleeeean!!!

Vulture Culture posted:

Assuming you're related to this product and not just evangelizing a thing you like: if I'm looking to pay someone to manage 1/3 of my logging stack, why would I do that instead of going all-in for a hosted offering like Loggly?

I do a lot of work on AWS so I just tend to know a bit about the services and have the advantage of being able to talk to Amazon people directly.

The biggest perks are it handles cluster management and scaling so you can just use their service then build a logging solution around it that makes you happy. Honestly if you're seriously considered Loggly just go for it as you'll still be building your own ELK stack that has it's own management requirements, the main benefit is the core bits (Kibana/Elasticsearch) are handled.

There's drawbacks to it like the inability to add custom plugins and being restricted to the metrics exposed through Cloudwatch.

# ? Mar 20, 2017 15:35

Pile Of Garbage: May 28, 2007

Has anyone here used or is currently using SkyFormation for pulling in events from Azure AD or O365 Security & Compliance to pass to on-premise SIEM? Just looking for some opinions, we're currently running McAfee ESM SIEM on-premise and need to pull in audit events from Azure AD and O365.

# ? Mar 21, 2017 06:45

chutwig: May 28, 2001; BURLAP SATCHEL OF CRACKERJACKS

jaegerx posted:

I've still got lots of friends at Rackspace running private openstack instances. I feel their pain. Most of the customers do like it though.

Know if any of them are in the NYC area and are looking for a new job? We are aggressively hiring OpenStack SREs and developer types. :getin:

# ? Mar 21, 2017 14:37

netcat: Apr 29, 2008

Where should I look if my instances don't get a reply to dhcp? We've had this problem before but it started working again when we moved the controller node to another server but now after a few weeks everything has broken down again. I've looked in the dnsmasq ip -> mac mapping files and they seem fine (I had an issue before where that file wasn't updated properly). If I do a tcpdump on the all the interfaces I can see that the request appears on both the controller and compute nodes but not on the tap interface for the network.

I haven't really touched the network conf so I'm unsure why it would just suddenly break.

# ? Mar 21, 2017 20:18

necrobobsledder: Mar 21, 2005; Lay down your soul to the gods rock 'n roll; Nap Ghost

Have you confirmed that the DHCP server is seeing the request on its interfaces? Is the reply making it through firewalls and gateways? What are your port mirroring / broadcast / promiscuous interface settings in your stack?

# ? Mar 22, 2017 00:49

netcat: Apr 29, 2008

Actually I lied earlier, it looks like the packets won't even arrive on the bridge. tcpdump on the tap interface for the VM on the compute host shows the dchp requests but I see nothing on any other interface on the path. "ovs-ofctl show br-int" shows br-int as down but I think that's how it's supposed to be after googling for a bit (?). Doing ifup on br-int didn't change anything anyway, as far as I could tell.

If I understand everything correctly packets travel something like:

code:

VM -> tapxyz -> br-int -> br-ex -> physical (compute) -> physical (controller/network) -> br-ex -> br-int -> tap-xyz -> dnsmasq

Now it stops in the second step. Earlier when I had a similar problem it stopped perhaps at br-ex on the controller.

We use vxlan, but I'm not sure about all the interface settings or where to find them since our install is managed by Fuel so I've done very little manual tinkering related to that.

The weird thing for me with this is that it's been kind of a gradual degradation. At first it worked OK, sometimes VM's failed to get ip from dhcp but it would work after restarting the instance. Then it started to fail more frequent, like 50% chance that the VM got connectivity and now it doesn't work at all.

# ? Mar 22, 2017 11:17

Vulture Culture: Jul 14, 2003; I was never enjoying it. I only eat it for the nutrients.

Jesus Christ, the Azure CLI 2.0 is one of the most user-hostile pieces of software I've ever experienced. It's like if every command in the AWS CLI only took ARNs as arguments. This thing is clearly designed for Jenkins or whatever and not to be ever used by a human to do work.

# ? May 5, 2017 15:19

Gucci Loafers: May 20, 2006; Ask yourself, do you really want to talk to pair of really nice gaudy shoes?

I don't understand why they wouldn't just use the same syntax of Powershell?

# ? May 5, 2017 17:37

underlig: Sep 13, 2007

The people who used to work here set up one of our backup-servers to use Azure for long-time protection.

We're using Microsoft DPM 2012r2 and backing up to local disk and something that in our Azure-portal is called "recovery services vault".

My problem is that i cannot see what is actually in this vault, in the portal i can only see something called "Cloud - LRS" using ~6tb of data. I've made multiple attempts to find the right link inside the portal to get access to it but i always end up running in circles.
"Backup items" are listed as 0 in the portal, but my servers still gets backed up to Azure because there are recovery points and i can download them.

What am i missing? Am i looking in the wrong place? Isn't portal the place i should be? My CSP said "oh just download azure storage explorer" but that doesn't even show me the recovery services vault, then they said is should find everything on portal but like ive said i can't navigate to the correct place there.

Could the old technicians have configured this for some old part of Azure that has since been forgotten, or lost in updates?

# ? May 10, 2017 19:59

Thanks Ants: May 21, 2004; #essereFerrari

Is it in "Backup vaults (classic)"?

https://portal.azure.com/#blade/HubsExtension/Resources/resourceType/Microsoft.Backup%2FbackupVault

# ? May 10, 2017 20:02

Adbot: ADBOT LOVES YOU

# ? Apr 26, 2024 03:43

underlig: Sep 13, 2007

Thanks Ants posted:

Is it in "Backup vaults (classic)"?

https://portal.azure.com/#blade/HubsExtension/Resources/resourceType/Microsoft.Backup%2FbackupVault

Unfortunately not there either.

I did find an option from our CSP to get "statistics" though all it shows is that the LRS has increased slowly, the vault however only shows small variations in size for the last five months.

I've asked the csp-support once more to help me.

# ? May 11, 2017 18:14

The Something Awful Forums > Discussion > Serious Hardware/Software Crap > Cloud Computing: Mostly fog machines in other people's datacenters

«‹›10 »