|
MagnumOpus posted:That's the main problem with it. Like Puppet/Chef it lets you describe your networks and services well enough but the deployments are kludgy as gently caress. And when the deployment doesn't work right, triage is a nightmare because the mess of scripts tend to leave artifacts all over the place.
|
# ? Mar 17, 2015 21:16 |
|
|
# ? Apr 29, 2024 07:11 |
|
I signed up for Building Cloud Apps with Microsoft Azure – Part 1 on EdX. It's a 4 week class, part 1 of 3,that starts on the 31st. Figured this thread would be a good place to recruit / discuss. quote:This course will walk you through a patterns-based approach to building real-world cloud solutions. The patterns apply to the development process as well as to architecture and coding practices.
|
# ? Mar 24, 2015 14:59 |
|
Does anyone anyone have a good referral on a AWS consultant who primarily handle SMB and working with poo poo networking hardware to get a VPC up and running?
|
# ? Mar 26, 2015 02:59 |
|
anybody using rackspaces's "OnMetal" yet? it seems like the only way I could ever go cloud, ec2's network latency and noisy-neighbor/cpu-steal garbage is just unbearable
|
# ? Mar 28, 2015 21:13 |
|
StabbinHobo posted:anybody using rackspaces's "OnMetal" yet?
|
# ? Mar 28, 2015 22:30 |
|
do you graph the 90th or 99th percentile of all your stuff at a 1 second resolution? edit: I would love to find out that c4 has solved this, but everyone I ask is looking at 1 minute averages in cloudwatch
|
# ? Mar 28, 2015 23:23 |
|
StabbinHobo posted:do you graph the 90th or 99th percentile of all your stuff at a 1 second resolution? Is your workload actually that performance sensitive? Can you not scale it horizontally? Maybe cloud is not for you.
|
# ? Mar 29, 2015 00:33 |
|
StabbinHobo posted:do you graph the 90th or 99th percentile of all your stuff at a 1 second resolution? Those events are logged as they come and there's probably never a second without a bunch of requests for every host involved. What metric specifically are you talking about in 1 minute averages on CloudWatch? There's no "network latency" metric as that's pretty generic and measuring means you need a common destination (unless you're saying there's just added latency for all network activity). Edit: Also Splunk is magic, if you want me to throw together something to monitor across some instances I'd be happy to try anything out! Less Fat Luke fucked around with this message at 00:54 on Mar 29, 2015 |
# ? Mar 29, 2015 00:51 |
|
evol262 posted:Is your workload actually that performance sensitive? quote:Can you not scale it horizontally? quote:Maybe cloud is not for you. Less Fat Luke posted:Those events are logged as they come and there's probably never a second without a bunch of requests for every host involved.
|
# ? Mar 29, 2015 01:45 |
|
StabbinHobo posted:yes, at millions of users and thousands of requests per second pretty much every ms matters (well, really every 10) Most of our traffic is fairly cacheable so we heavily use Varnish in EC2 as well as CloudFront (and some Akamai). Despite you being a dick my offer still stands to test stuff for you My "never less than 10/sec" was indicating that even our least used hosts still are capturing endpoint requests and latency.
|
# ? Mar 29, 2015 01:52 |
|
StabbinHobo posted:yes, at millions of users and thousands of requests per second pretty much every ms matters (well, really every 10) And you missed the point a little. Scaling horizontally also lets you spread the load and schedule on a flatter, larger infrastructure where CPU usage isn't the end of your world because it never got scheduled somewhere with high utilization. You don't need to be condescending. It's unlikely that your app is a special snowflake that just can't bear what other apps do with a little rearchitecture. Obviously steal is bad, but local interrupts are just as nasty sometimes. Are you running realtime? What are you expecting to gain from metal that you can't get from adding more instances and caching?
|
# ? Mar 29, 2015 02:20 |
|
i'll just take all that as a 'no'
|
# ? Mar 29, 2015 02:49 |
|
|
# ? Mar 29, 2015 05:55 |
|
evol262 posted:What are you expecting to gain from metal that you can't get from adding more instances and caching? He clearly said "low latency", doesn't matter how you scale out it won't drop the latency below whatever the minimum is of the infrastructure being used. Apps can't always be restructured to suit either, I've had customers with requirements like his that meant they needed bare metal hardware too. it happens.
|
# ? Mar 29, 2015 06:21 |
|
theperminator posted:He clearly said "low latency", doesn't matter how you scale out it won't drop the latency below whatever the minimum is of the infrastructure being used. It's a useless requirement unless you specify whether it's intra or extra environmental latency, which is why I asked. Hoping to gain latency from metal is reasonable if you can't handle it on some eventing layer. Moving off site is a no-go if it's end users or something outside the local environment. That's why I asked. Also, " minimum" latency for an environment doesn't play in much. You absolutely can restructure to deal with "latency" even if you can't muck with the app code by doing what Netflix does and rescheduling instance's which are seeing steal. Obviously their problems are different, but if the latency issue is steal, that's solvable
|
# ? Mar 29, 2015 07:04 |
|
FWIW we split responsibilties between orchestration tools like chef and app deployment. We render unto chef the basics of the environment -- the underlying OS services and core apache / nginx / mysql configurations and the like. We then leave apps and our CI servers responsible for talking to source control and pushing apps out to servers. This keeps things a bit more transportable, keeps developers out of dealing with chef and the like and seems to work pretty well, at least for our workflow and scenario -- loads of little apps owned by different groups.
|
# ? Mar 29, 2015 14:49 |
|
If I had to guess he's doing some sort of voip thing where low jitter's critically important. At Vonage, the primary proprietary voip server we ran was extremely sensitive because it was built in C and tuned over a number of years to run and rely on bare metal. I was able to stick it in ec2 but we never got the performance we were hoping for. It was good enough though because it was still cheaper than sticking a datacenter in low-subscriber companies, but it wasn't optimal. If rackspace's bare metal were around at that point we'd have beelined for it. It is a bit silly that you have to turn up/turn down instances that are underperforming and kind of game the system - but that's cloud for you. Bhodi fucked around with this message at 15:34 on Mar 29, 2015 |
# ? Mar 29, 2015 15:31 |
|
Bhodi posted:It is a bit silly that you have to turn up/turn down instances that are underperforming and kind of game the system - but that's cloud for you.
|
# ? Mar 29, 2015 15:46 |
|
I never have to do that. We have a bunch of alarms for steal time being too high or performance dropping and it only fires when doing CPU-intensive tasks on memory-optimized instances (which kind of makes sense). It's still worth monitoring for both though. Originally we were setting things up to auto-terminate under-performing instances but now it happens so rarely that we just have a report sent to the ops team if anything like that shows up.
|
# ? Mar 29, 2015 16:11 |
|
adorai posted:Just wait until everyone starts doing it. re: OnMetal, there's no hourly billing option like SoftLayer, only monthly. What's the loving point?
|
# ? Mar 30, 2015 15:13 |
|
OpenStack nerds: is there any way to have an instance automatically terminate on shutdown, like in EC2?
|
# ? Mar 31, 2015 05:00 |
|
Full disclosure: I work on the OnMetal product at Rackspace. Mainly doing Openstack-related, open source development.StabbinHobo posted:anybody using rackspaces's "OnMetal" yet? I am! In a way, at least. Do you have any specific, quantifiable questions? I'll try to avoid making subjective claims like: "It's great!", because I'm obviously biased. Misogynist posted:It's not really a problem in most other environments (GCE, Azure, etc.), because nobody else oversubscribes CPU resources to the level that Amazon does. Actually, usage is metered down to the minute. You can see the hourly rates here, under the OnMetal heading. The messaging on the main OnMetal landing page leaves a bit to be desired. It only talks about monthly rates, but you're definitely billed for instances by the minute. Misogynist posted:OpenStack nerds: is there any way to have an instance automatically terminate on shutdown, like in EC2? 'nova delete', if you're using the python-novaclient, should do what you're asking. See also: http://developer.openstack.org/api-ref-compute-v2.html . Specifically 'delete server' under the 'Servers' heading. Unless I'm misunderstanding what you're trying to do.
|
# ? Mar 31, 2015 16:21 |
|
In EC2 you can set an instance so that if it is shut down (like "shutdown -h" at the command line), it automatically terminates instead of just shutting down as it normally would. He's asking if you can do that in OpenStack. Without explicitly calling "nova delete". I don't think OS has that feature, though I could easily be wrong. What we've done when we need automation that's not provided by OpenStack itself is write a little Python app that listens to the rabbit queues and takes appropriate action. You could write a handler that listens for the instance stop message and automatically fire a delete command when one comes through, for example. Docjowles fucked around with this message at 16:43 on Mar 31, 2015 |
# ? Mar 31, 2015 16:41 |
|
Docjowles posted:In EC2 you can set an instance so that if it is shut down (like "shutdown -h" at the command line), it automatically terminates instead of just shutting down as it normally would. He's asking if you can do that in OpenStack. Without explicitly calling "nova delete". Docjowles posted:I don't think OS has that feature, though I could easily be wrong. What we've done when we need automation that's not provided by OpenStack itself is write a little Python app that listens to the rabbit queues and takes appropriate action. You could write a handler that listens for the instance stop message and automatically fire a delete command when one comes through, for example. (A generic stream processing service for OpenStack that gets messages and fires webhooks might be even nicer to have someday.) Vulture Culture fucked around with this message at 18:54 on Mar 31, 2015 |
# ? Mar 31, 2015 18:49 |
|
Misogynist posted:(A generic stream processing service for OpenStack that gets messages and fires webhooks might be even nicer to have someday.) We're currently looking into doing something along these lines with Stackstorm. Will be a while before I can give a field report though, we're still slowly unfucking the damage these devs did in the six months they were "doing DevOps" before it occurred to them to hire some people actually trained in operations.
|
# ? Apr 1, 2015 00:37 |
|
MagnumOpus posted:We're currently looking into doing something along these lines with Stackstorm. Will be a while before I can give a field report though, we're still slowly unfucking the damage these devs did in the six months they were "doing DevOps" before it occurred to them to hire some people actually trained in operations.
|
# ? Apr 1, 2015 02:12 |
|
Misogynist posted:This is really awesome, thanks for this! I'd be curious to hear what you think about it!
|
# ? Apr 1, 2015 02:53 |
|
Anyone using Foreman for orchestration? We're evaluating tools and I'm most familiar with Chef + mess of scripts for orchestration. Looking for a better way that will handle orchestration from the tenant (OpenStack) up, with a caveat that it needs to gracefully integrate extensions for auto-scaling logic we might write.
|
# ? Apr 2, 2015 16:12 |
|
MagnumOpus posted:Anyone using Foreman for orchestration? I have nothing bad to say about foreman at all. It was pretty puppet-centric in the past, but that's all better now
|
# ? Apr 2, 2015 18:54 |
|
I've been hoping to deploy Foreman for like the last year. They have a decent SaltStack plugin (what we use for config management), too. Unfortunately it requires a newish version of Salt, and there's a couple ridiculous bugs that are blocking us from upgrading. They're all marked as fixed in the next Salt release, so I'm hopeful that we can finally start testing Foreman sometime soon.
|
# ? Apr 2, 2015 21:07 |
|
Sat server 6 uses foreman and people seem to like it. It's anecdotally a good choice for what it does.
|
# ? Apr 2, 2015 21:09 |
|
I have not read enough but Satellite as well and I definitely will look deeper into both now. It looks like we're locked into a masterless Puppet architecture for the foreseeable future, so if Foreman fits together nicely that would be best. I just added a secure secrets storage story to our backlog this afternoon. What's the new hotness there? We have a semi-directive to keep the platform loosely IaaS agnostic, so integrating it with Keystone is out. I'm looking at both Conjur, which would fit real well into our environment assuming they can produce an OpenStack image and blackbox which would snap right onto our teams existing "secrets all up in repos" workflow but doesn't feel very durable.
|
# ? Apr 3, 2015 00:32 |
|
MagnumOpus posted:I have not read enough but Satellite as well and I definitely will look deeper into both now. It looks like we're locked into a masterless Puppet architecture for the foreseeable future, so if Foreman fits together nicely that would be best. I've never used any of the secret sharing, so no comment there. Last time I installed foreman, it wanted to be an external classifier and optionally a puppetmaster. Never tried it masterless, but I would think it then becomes an overblown image deployment tool, which is maybe what you want.
|
# ? Apr 3, 2015 02:25 |
|
evol262 posted:I've never used any of the secret sharing, so no comment there. It sort of is. Our OpenStack provider relationship requires us to use specific base images that we can't branch or anything, so we have to config once the VM comes up. Not ideal I know but :corp lyfe:. Edit: This is not to say there aren't reasons. Again it's a possibly over-reaction to concerns about being IaaS-agnostic, but it's the directives I have to work with.
|
# ? Apr 3, 2015 02:30 |
|
cliffy posted:Full disclosure: I work on the OnMetal product at Rackspace. Mainly doing Openstack-related, open source development. awesome! hi are there extra software layers involved in the host-to-host networking? like, in a colo setup its code -> kernel ip stack -> ethernet driver -> 1 - 4 switches -> ethernet driver -> ip stack -> code. what (if any) extra hops/layers does onmetal have? since there's no dom0, how do you handle network/customer segmentation? any chance its just plain-old-vlans? how is ironic? I'm still in cobbler/kickstart land, and #including all of openstack's... accoutrement seems like... I guess no ones ever been able to coherently pitch it to me without rambling about the benefits of being able to run hundreds of vm's, which I consider a full blown anti-pattern. how do you guys clean the disks between customers? similarly, how do you check for things like ssd wear?
|
# ? Apr 6, 2015 14:03 |
|
StabbinHobo posted:awesome! hi Onmetal will still go through Neutron and whatever segmentation rackspace uses for that (vxlan or GRE, probably, though plain vlans are an option). It's really unlikely that they're using plain nova networks. Rackspace probably has openflow/neutron switches from Cisco or juniper, so it won't need to run all the way to a neutron controller, but all of those pieces still matter. The advantage of ironic is that you can deploy images and get all the cloud-init bits, so the same image running in virt somewhere is running on metal, and it ties into heat and everything for autoscaling and formations and tenant networks, and... That's the pitch. It is not a replacement for cobbler or the foreman discovery image. It extends openstack. The same openstack patterns still apply to ironic.
|
# ? Apr 6, 2015 14:53 |
|
StabbinHobo posted:awesome! hi Greetings! StabbinHobo posted:are there extra software layers involved in the host-to-host networking? like, in a colo setup its code -> kernel ip stack -> ethernet driver -> 1 - 4 switches -> ethernet driver -> ip stack -> code. what (if any) extra hops/layers does onmetal have? since there's no dom0, how do you handle network/customer segmentation? any chance its just plain-old-vlans? I'm not a networking expert. That said, I'm told there are no extra layers than what you laid out. Customer network segmentation is currently handled by $cisco_magic, but are not true isolated networks. Isolated networks are considered a critical feature, so we should have them sooner rather than later. When we do roll them out they will likely be VXLAN-based. StabbinHobo posted:how is ironic? I'm still in cobbler/kickstart land, and #including all of openstack's... accoutrement seems like... I guess no ones ever been able to coherently pitch it to me without rambling about the benefits of being able to run hundreds of vm's, which I consider a full blown anti-pattern. It works for us, but afaict OnMetal is literally the only production bare-metal cloud product using Ironic. Other companies may be using it for servicing in-house deployments. It helps that we have two Ironic core developers on our team. To deploy Ironic you do need a few accompanying Openstack services, which, at a minimum, comprise of: Nova (Compute scheduling service), Glance (Image service), and Keystone (Identity service). The VM ramblings probably come from the fact that Nova is designed to schedule/provision VMs using drivers which communicate with various hypervisors. Nova treats Ironic like a hypervisor. You could consider Ironic as a hypervisor which completely vacates the machine once the machine gets provisioned. Though the specifics of Ironic behavior depend on which bare-metal driver you have configured within Ironic to manage machines. StabbinHobo posted:how do you guys clean the disks between customers? similarly, how do you check for things like ssd wear? OnMetal uses the Ironic Python Agent driver in Ironic to machines so you can see exactly what we do to erase disks here: https://github.com/openstack/ironic-python-agent/blob/master/ironic_python_agent/hardware.py#L429 The short of it is we use ATA enhanced secure erase where available, and fall back to ATA secure erase when the enhanced version is not available. We currently collect SMART data, but doing further analysis and generating actions on said data is a work in progress.
|
# ? Apr 6, 2015 17:24 |
|
so appreciative, thanks if I may keep going, how is some kind of console or out of band access handled? for instance, lets assume I typo something in my kickstart, how do i get on that console and hit alt+f2 to see what went wrong? edit: oh poo poo, if customers are sharing a vlan... can I kickstart? (pxe boot?) I assume you have to handle dhcp then not me? do you allow for configurable paramaters like next-server then?
|
# ? Apr 7, 2015 00:50 |
|
StabbinHobo posted:so appreciative, thanks I don't think you get any kind of console access at all with OnMetal; their documentation strongly implies that these are only available on virtual servers. Vulture Culture fucked around with this message at 02:56 on Apr 7, 2015 |
# ? Apr 7, 2015 02:52 |
|
|
# ? Apr 29, 2024 07:11 |
|
Anyone using Cloud Foundry and if so what are you doing for system metrics? I see that Collector has been deprecated but the Firehose system that is replacing it is still very new and the only interfaces ("nozzles") I can find around are prototypes.
|
# ? May 4, 2015 18:02 |