Register a SA Forums Account here!
JOINING THE SA FORUMS WILL REMOVE THIS BIG AD, THE ANNOYING UNDERLINED ADS, AND STUPID INTERSTITIAL ADS!!!

You can: log in, read the tech support FAQ, or request your lost password. This dumb message (and those ads) will appear on every screen until you register! Get rid of this crap by registering your own SA Forums Account and joining roughly 150,000 Goons, for the one-time price of $9.95! We charge money because it costs us money per month for bills, and since we don't believe in showing ads to our users, we try to make the money back through forum registrations.
 
  • Locked thread
MagnumOpus
Dec 7, 2006

Bhodi posted:

Roll call, who's bought into any of this, and for how much? Spill your shame here.

Couple years back my team built a multi-DC private cloud with VMWare ESXi for infrastructure and a combination of Chef and in-house microservices supplying the platform layer. For sure the coolest thing I ever did but also the source of most of my gray hairs.

Adbot
ADBOT LOVES YOU

MagnumOpus
Dec 7, 2006

Misogynist posted:

"Hand-fed cattle."

Stealing this.

MagnumOpus
Dec 7, 2006

high six posted:

I think a lot of it needlessly complicates things where it doesn't need to be used and causes a lot of unneeded issues.

This could be said about any poorly-considered operational expenditure or architectural decision. If your computing needs are mostly big ERP applications with a ton of vendor-supported data sources cloud is probably not in your future. However there are use cases where some cloud concepts are the right solutions architecture: a SaaS company with unpredictable usage patterns, scientific orgs that can make use of on-demand Hadoop clusters, etc.

Some of the enterprises sprinting to the cloud are certainly doing so prematurely and for the wrong reasons. Many more severely underestimate the development effort and, I think more importantly, development philosophy shifts that are required to make cloud actually work for you. But for a great many common scenarios some form of cloud architecture is absolutely worth at least doing a proof of.

MagnumOpus
Dec 7, 2006

Is BOSH actually a piece of poo poo or am I missing something? It just seems so archaic compared to Chef or Puppet.

MagnumOpus
Dec 7, 2006

evol262 posted:

It's basically unified deployment which combines the principles behind amis/openstack volumes/docker and config management (except it involves a lot of terrible shell scripts)

That's the main problem with it. Like Puppet/Chef it lets you describe your networks and services well enough but the deployments are kludgy as gently caress. And when the deployment doesn't work right, triage is a nightmare because the mess of scripts tend to leave artifacts all over the place.

Also you can't use package managers in your nodes because it compiles everything on the master.

MagnumOpus
Dec 7, 2006

Misogynist posted:

(A generic stream processing service for OpenStack that gets messages and fires webhooks might be even nicer to have someday.)

We're currently looking into doing something along these lines with Stackstorm. Will be a while before I can give a field report though, we're still slowly unfucking the damage these devs did in the six months they were "doing DevOps" before it occurred to them to hire some people actually trained in operations.

MagnumOpus
Dec 7, 2006

Misogynist posted:

This is really awesome, thanks for this!

I'd be curious to hear what you think about it!

MagnumOpus
Dec 7, 2006

Anyone using Foreman for orchestration?

We're evaluating tools and I'm most familiar with Chef + mess of scripts for orchestration. Looking for a better way that will handle orchestration from the tenant (OpenStack) up, with a caveat that it needs to gracefully integrate extensions for auto-scaling logic we might write.

MagnumOpus
Dec 7, 2006

I have not read enough but Satellite as well and I definitely will look deeper into both now. It looks like we're locked into a masterless Puppet architecture for the foreseeable future, so if Foreman fits together nicely that would be best.

I just added a secure secrets storage story to our backlog this afternoon. What's the new hotness there? We have a semi-directive to keep the platform loosely IaaS agnostic, so integrating it with Keystone is out. I'm looking at both Conjur, which would fit real well into our environment assuming they can produce an OpenStack image and blackbox which would snap right onto our teams existing "secrets all up in repos" workflow but doesn't feel very durable.

MagnumOpus
Dec 7, 2006

evol262 posted:

I've never used any of the secret sharing, so no comment there.

Last time I installed foreman, it wanted to be an external classifier and optionally a puppetmaster. Never tried it masterless, but I would think it then becomes an overblown image deployment tool, which is maybe what you want.

It sort of is. Our OpenStack provider relationship requires us to use specific base images that we can't branch or anything, so we have to config once the VM comes up. Not ideal I know but :corp lyfe:.

Edit: This is not to say there aren't reasons. Again it's a possibly over-reaction to concerns about being IaaS-agnostic, but it's the directives I have to work with.

MagnumOpus
Dec 7, 2006

Anyone using Cloud Foundry and if so what are you doing for system metrics? I see that Collector has been deprecated but the Firehose system that is replacing it is still very new and the only interfaces ("nozzles") I can find around are prototypes.

MagnumOpus
Dec 7, 2006

Bhodi posted:

Like the guy said, systems guys are terrible at writing APIs and unfortunately, systems guys are the only ones writing cloud software.

That's an interesting perspective. Most of my complaints about CF come from what appear to be developer-mindset decisions that go against literal decades of systems engineering convention e.g. BOSH and its VCAP madness.

MagnumOpus
Dec 7, 2006

I was gonna try to white knight Openstack but last week when we came back up from our provider maintenance window we had a bunch of VMs with no volumes mounted and a couple that had the wrong volumes mounted so I can't do that with a straight face. We were considering moving to Rackspace, but our west coast partner BU just decided to get out of there themselves for similar instability issues. I see a refactor to a new provisioner in our future.

MagnumOpus
Dec 7, 2006

Hell is other people's OpenStack flavors.

MagnumOpus
Dec 7, 2006

I can't believe there are supposed professionals in this thread making blanket statements one way or the other about cloud. We all know there are some use cases where it is and where it isn't the right choice, like every other implementation option in the industry. Let's keep this topic more useful than the sales blogosphere please.

adorai posted:

Has anyone here spearheaded the move of a real established enterprise to AWS or Azure?

adorai posted:

i am mostly interested in how you convinced the other techs that it was a good move. I am considering a split move to aws and azure, and while I can effectively make it happen, i want to know how to not make my infrastructure guys (who will still have jobs) hate me.

Whether cloud is right for you depends on a lot of factors. Can you share some about your deployment?

- Are you using database systems that are designed to scale vertically or horizontally? OLAP or OLTP workloads or both?
- Do you have spike utilization during busy hours or is your profile more stable throughout the day?
- Got private/regulated data?
- How prepared is your org for doing DevOps work? This is a big one that is often overlooked; all that elasticity (generally) only pays off if you're willing to implement and maintain systems that actually scale without constant live tinkering.

MagnumOpus fucked around with this message at 19:56 on Aug 15, 2015

MagnumOpus
Dec 7, 2006

What are people doing for portable backups in Openstack? For Cassandra we're using tablesnap to archive to a S3 target and while that's not bulletproof its okay for where we are in development. My main conundrum right now is how to backup curated systems like Jenkins and Sonarqube. For Jenkins I'm just mounting $JENKINS_HOME to an attached volume and snapshotting that on the reg. A little bit of reading suggests I can then backup those snapshotted volumes to Swift after converting snapshot to volume, but that doesn't really feel like it gets me anything new since Swift is at the same physical location. Is there a facility in Swift for migrating objects between DCs?

MagnumOpus
Dec 7, 2006

Re: Safe Harbour most mature companies already have processes in place with their corporate customers to transfer data under EU Model Clause agreements. For those needing quick alternatives this is the first place I'd suggest looking. Of course, EU Model Clauses may be susceptible to the same legal criticism as Safe Harbour, but for the moment they serve as a legal model to facilitate data transfer and protection agreements for anyone doing offshore data warehousing.

MagnumOpus
Dec 7, 2006

I'm 100% anecdotally sure many of the problems we're running into are due to intermittent network failures, but since we're using an Openstack IaaS provider I don't have access to logs at that level. I've got TCP failures and retransmits in my metrics pipeline, but I'm looking for something more compelling. What would be perfect is a small agent app that can constantly monitor links and track failures explicitly, because what I expect is that we're getting frequent jitters rather than hard link failures. Basically I just want to be able to prove if this IaaS is too brittle for production. Ideas?

MagnumOpus
Dec 7, 2006

Thanks for all the input! I'm taking a 3-pronged approach to the problem:

1) Rebuilt some graphite boards to get a better look at network stats across all hosts
2) Smokeping. I have not used this before but I've got another guy familiar so we should be able to roll out quickly.
3) I'm going to write a monitoring agent based on hashicorp memberlist. This will hopefully let me differentiate different types of link failures by acting at the time of detection to verify from multiple hosts. If it works out the way I want, this should be capable of answering my primary question of overall system stability.

MagnumOpus
Dec 7, 2006

Vulture Culture posted:

My personal record is 253 on a quad-core running TSM

Every time we get a significant network event that causes cascading failure our 8-core Logstash server gets slammed and hits around 280 as it tries to process the dramatic upswing in error messages.

MagnumOpus
Dec 7, 2006

Currently sitting here with my thumb up my butt, waiting for a call back from my internal Openstack provider. I needed some more resources so I went into their customized Horizon portal and configured a quota increase request. Clicked Submit, got an error page, and then 30 seconds later an email letting me know my request to deactivate all of our configured user accounts, including the ones for automation, had completed successfully.

:bang:

MagnumOpus
Dec 7, 2006

Wait wait wait are you telling me that I can't just shove a 3-node cassandra cluster into my environment, try to hack it into performing like an immediately-consistent database, and expect the cloud to magically recover everything when I lose one of those nodes.

MagnumOpus
Dec 7, 2006

Bhodi posted:

OSes are magic, virtualization is magic, the cloud is magic, everything is magic

Sadly, magic doesn't mean operable, or good

I feel like this is the biggest change that occurred in the web ops world in terms of its effect on my day-to-day. It used to be that no one else in the org cared about what we were doing on an infrastructure level because they knew it was a nightmare realm they dared not enter. Since the rise of everything "cloud" now we've got armies of web devs who know just enough to be dangerous. My job is now running around trying to keep assholes from loving up long-term plans by applying half-understood cloudisms to their designs. Just about every day I find a new thing that makes me go cross-eyed with rage, and on the days I don't it's because instead I spent 3 hours arguing a web dev down from his 20% understanding of platform and infrastructure concepts.

MagnumOpus
Dec 7, 2006

Vulture Culture posted:

This is a great reason to have devs responsible for operating their own software

O-ho-ho but you see "we are all just engineers in this group" because we're "avoiding silo-ization". We're "flat" and a "unified development team" "leveraging devops paradigms" and responsible for "self-organizing".

EDIT: All of these are real things that can be accomplished. I don't work for a company group that understands how to do that, so instead we just say pay lip service to all this except for using it to chastise people who point out problems.

MagnumOpus fucked around with this message at 21:00 on Jan 14, 2016

Adbot
ADBOT LOVES YOU

MagnumOpus
Dec 7, 2006

necrobobsledder posted:

developer-centric start-ups and because they're paid stupidly high wages they feel they're an authority on anything besides just plain code

Also, gently caress the developers that think reading highscalability.com posts means they're a goddamn CCIE and VCDX architect.

These are unfortunately the roots of my problems. My company is huge and developer-centric, and the business unit I'm in has only ever built software that is run on-premise by their customers so they have absolute zero experience in running a webapp enterprise. This fact has not stopped their "architects" from making designs that commit us to poorly-researched solutions before the ops team has any opportunity to intervene. For examples they were recently stumped by the eventually-consistent nature of Cassandra, forcing a massive refactor of their app, and don't even get the PM started on how badly they missed the mark on estimating SSD storage costs.

  • Locked thread