|
You can allow them to create users and attach those users to an existing group but not let them edit/create any permission themselves
|
# ? Oct 14, 2015 16:01 |
|
|
# ? Apr 28, 2024 13:12 |
|
That sounds ideal, I will look into it.
|
# ? Oct 14, 2015 17:19 |
|
So I have this so far:code:
I'd like to neaten up the two users at the end though - they are members of the administrators group and the IAM Policy Simulator showed that ChangePassword was still allowed. Is there a way to evaluate group membership in the policy? Thanks Ants fucked around with this message at 22:32 on Oct 14, 2015 |
# ? Oct 14, 2015 22:14 |
|
i think your making this a little harder than it needs to becode:
|
# ? Oct 17, 2015 09:24 |
|
I'm using AWS I've created a VPC with public and private subnets. All subnets can access the internet, the private subnets obviously get there via NAT instance. My problem: I need to create an s3 bucket that is locked down to an instance, group of instances, or subnet in the private subnet that they can access. Things I've tried: Opening the bucket to 0.0.0.0/0 works, locking the bucket locking the bucket down to a specific range (10.0.0.0/0, my vpc is 10.53.x.x) I can't access it in the private subnet. I've attached a role to the machine that has privileges to do anything to any resource in AWS and even this doesn't work. Does anyone have any suggestions, I've read that s3 endpoints are a solution, but I wanted to see if I could do it the way I figured it would work first, has anyone else been through this particular problem?
|
# ? Oct 17, 2015 16:00 |
|
You'll need to use a s3 endpoint for this or lock it down to the public in addresses alternatively you could set up a set of app keys with get/put permission and lock it down that way
|
# ? Oct 17, 2015 19:22 |
|
incoherent posted:Did they port DFSR to azure yet? I would love to see Microsoft back that with 5 9's. I don't see the problem. Replication errors and empty targets should be pretty easy for them to keep up since it's the natural state of the technology.
|
# ? Oct 17, 2015 19:48 |
|
Megaman posted:I'm using AWS I've created a VPC with public and private subnets. All subnets can access the internet, the private subnets obviously get there via NAT instance. why wouldnt you use an s3 endpoint, thats literally their use case
|
# ? Oct 17, 2015 19:49 |
|
if you wanted to run a cassandra ring or a xtradb cluster which three cloud providers would you run it across? I guess basically who are the top 3 where the product is similar enough to get stuff working and not poo poo for some other reason.
|
# ? Nov 8, 2015 16:15 |
|
StabbinHobo posted:if you wanted to run a cassandra ring or a xtradb cluster which three cloud providers would you run it across? I guess basically who are the top 3 where the product is similar enough to get stuff working and not poo poo for some other reason. Why do you need three different cloud providers instead of three regions on the same provider?
|
# ? Nov 8, 2015 18:51 |
|
I guess Amazon, Google, and MS Azure? But I support VC in that trying to run a MySQL cluster across the public internet sounds like a special kind of hell and I'd encourage you not to do that! edit: xtradb cluster is Percona's fork of galera IIRC
|
# ? Nov 8, 2015 19:21 |
|
Docjowles posted:edit: xtradb cluster is Percona's fork of galera IIRC
|
# ? Nov 8, 2015 19:29 |
|
StabbinHobo posted:if you wanted to run a cassandra ring or a xtradb cluster which three cloud providers would you run it across? I guess basically who are the top 3 where the product is similar enough to get stuff working and not poo poo for some other reason. Helion, Rackspace, Softlayer openstack trifecta comedy option. As mentioned, this is a bad idea. If you want redundancy, run in multiple AZs/regions. Sane tooling across multiple providers is OK with some orchestration tools, but mostly it'll be a headache. Especially little differences between AMIs and the images elsewhere (imported qcows or ovas or whatever), and that headache just gets worse by building your own and uploading it everywhere.
|
# ? Nov 8, 2015 20:45 |
|
Vulture Culture posted:XtraDB Cluster is a MySQL distribution which includes XtraDB and Galera, among other things. Though it includes Galera, you don't need to use Galera. True, although I don't know why you'd bother to use the cluster version if you weren't going to cluster. Then again it's a weird question so I guess I shouldn't assume anything!
|
# ? Nov 8, 2015 21:10 |
|
Docjowles posted:True, although I don't know why you'd bother to use the cluster version if you weren't going to cluster.
|
# ? Nov 8, 2015 21:40 |
|
Newish to openstack (we're using Mirantis fuel to get it off the ground). I have about a million questions, but lets start with: Mainly, what are some fun/valuable things to do with yer cloud once you've got it? Image baking is going to lead to some real poo poo in my company... how do you you do it well (hook it to app ci, etc...)? Most our development is lovely java webapps. Not accessing each and every vm directly is a shift for us.... Log aggregating I get, but do people just not monitor the underlying vms of their services? Or do I just accept that I'm going to attach a floating ip to each vm? We're missing some key services... dnsaas mostly, but the lbaas also isn't 'production-ready' (the haproxy objects only run on 1 controller and don't fail over)... do I just cry til these get added in what I assume will be 3 years?
|
# ? Nov 9, 2015 13:48 |
|
Ryaath posted:Image baking is going to lead to some real poo poo in my company... how do you you do it well (hook it to app ci, etc...)? Most our development is lovely java webapps. Ryaath posted:Not accessing each and every vm directly is a shift for us.... Log aggregating I get, but do people just not monitor the underlying vms of their services? Or do I just accept that I'm going to attach a floating ip to each vm? Ryaath posted:We're missing some key services... dnsaas mostly, but the lbaas also isn't 'production-ready' (the haproxy objects only run on 1 controller and don't fail over)... do I just cry til these get added in what I assume will be 3 years? If you're okay talking to the OpenStack compute API, it's really, really easy to automate HAProxy or an F5 BigIP or whatever your preferred load balancing technology is from something as simple as a Python script (I'm told there's a Powershell API client now, thanks to Rackspace). Don't rely on DNS for dynamic services. Vulture Culture fucked around with this message at 15:58 on Nov 9, 2015 |
# ? Nov 9, 2015 15:56 |
|
Thanks for the reply VC. We're using packer already with the puppet configuration mgmt we already had. Creating the images is no problem. Managing the life cycle or the build hierarchy is where we're struggling. I'll look into sensu and that lbaas article you linked. I found Graylog offers a vm appliance image, so I just shoved that into glance and I'll let our app teams (hopefully) figure out the application logging from there...
|
# ? Nov 13, 2015 07:26 |
|
I'm 100% anecdotally sure many of the problems we're running into are due to intermittent network failures, but since we're using an Openstack IaaS provider I don't have access to logs at that level. I've got TCP failures and retransmits in my metrics pipeline, but I'm looking for something more compelling. What would be perfect is a small agent app that can constantly monitor links and track failures explicitly, because what I expect is that we're getting frequent jitters rather than hard link failures. Basically I just want to be able to prove if this IaaS is too brittle for production. Ideas?
|
# ? Dec 1, 2015 21:39 |
|
MagnumOpus posted:I'm 100% anecdotally sure many of the problems we're running into are due to intermittent network failures, but since we're using an Openstack IaaS provider I don't have access to logs at that level. I've got TCP failures and retransmits in my metrics pipeline, but I'm looking for something more compelling. What would be perfect is a small agent app that can constantly monitor links and track failures explicitly, because what I expect is that we're getting frequent jitters rather than hard link failures. Basically I just want to be able to prove if this IaaS is too brittle for production. Ideas? Something like PRTG? If you have a Windows host you can install it there and the free version will monitor 100 sensors over SNMP, ping or whatever. It will show almost everything concerning latency, bandwidth, bandwidth usage etc. It is also pretty great for home use.
|
# ? Dec 3, 2015 08:21 |
|
MagnumOpus posted:I'm 100% anecdotally sure many of the problems we're running into are due to intermittent network failures, but since we're using an Openstack IaaS provider I don't have access to logs at that level. I've got TCP failures and retransmits in my metrics pipeline, but I'm looking for something more compelling. What would be perfect is a small agent app that can constantly monitor links and track failures explicitly, because what I expect is that we're getting frequent jitters rather than hard link failures. Basically I just want to be able to prove if this IaaS is too brittle for production. Ideas? This sounds like they may be doing doing GRE or VXLAN encapsulation with an MTU that's too small, but it's hard to tell just from this post.
|
# ? Dec 3, 2015 16:19 |
|
MagnumOpus posted:<intermittent network failures> Ideas? 1. Asymmetric routing. It's quite common but if you run mtr and watch packets get dropped somewhere roughly around a 50% duty cycle across a connection and you have two primary network paths available, you're looking at this as a fundamental problem. This is what oftentimes occurs between two different physical networks like across WANs and BGP where you advertise AS paths and sometimes the other peer does not quite respect your routing prefixes. AWS does respect this unlike many others, so request them, dammit. I used mtr to diagnose this problem live as it happened. Sadly enough, I'm not even a network admin and using that taught our network architects a new tool to use (yeah.... that's not a good sign when your random-rear end contracted devops guy is figuring poo poo out for your supposedly best network guys) 2. As mentioned above, mismatched TCP MTU. Note that AWS VMs use an MTU of 9001 by default and despite being off by one can chunk 1500 multiples fine, but having to convert a lot can result in packet fragmentation problems that translate ultimately into retransmits and packet reassembly times going up. 3. Just check your TTLs when to make sure that they're not expiring once in a while from a really, really, really complicated network. Had a user that was on a 40+ hop network complaining about how he couldn't get to AWS VMs reliably because it was so slow. Half his packets were dropping from an ancient network (literally almost as old as me) shoehorned onto a random-rear end backbone and so forth and TTL was just plain running out. 4. If you're using ping to AWS (doubtful, you're with an OpenStack provider), AWS has told me they're supposed to drop somewhere around 10% of ping traffic for performance reasons - check that your provider is not doing traffic shaping or anything to cause this. 5. Our instances (running VMware) are on severely overprovisioned clusters and drop pings randomly from underlying hardware just plain not keeping up to the point where our software HA solution is more of a liability because it detects 3 ping failures and tries to failover and that's about when it fails back, so we get all sorts of inconsistent state problems. Check /var/log/dmesg for kernel messages For some general ideas, Brendan Gregg's book and his website have all sorts of solid methodologies for "figure out wtf is going wrong" and "why is poo poo so slow?" problems. To more directly answer your monitoring question, you seem to need event correlation alongside your network monitoring. We're just running Graphite with Sensu grabbing NIC metrics and shoving them onto the AMQP bus and I map different time series together onto the time domain and look for patterns. A lot of this tends to just plain suck because our infrastructure has a serious case of a lot because ntpd doesn't even work and half our clocks are off by 4+ minutes, but looking into how your TCP stack behaves with the rest of your system state is handy when running applications. In most cases, threatening to drop your provider because you're having intermittent network problems will almost always get them on the phone and trying to diagnose your issue right away. You can improve your chances of faster resolution by providing network analysis for the vendor's network folks trying to eliminate the above issues (mtr - newer versions support MPLS labels btw, sar, maybe nmap for its peculiar traceroute methods, and tcp statistics from tcpdump , etc.)
|
# ? Dec 5, 2015 05:37 |
|
MagnumOpus posted:I'm 100% anecdotally sure many of the problems we're running into are due to intermittent network failures, but since we're using an Openstack IaaS provider I don't have access to logs at that level. I've got TCP failures and retransmits in my metrics pipeline, but I'm looking for something more compelling. What would be perfect is a small agent app that can constantly monitor links and track failures explicitly, because what I expect is that we're getting frequent jitters rather than hard link failures. Basically I just want to be able to prove if this IaaS is too brittle for production. Ideas? Smokeping?
|
# ? Dec 7, 2015 00:45 |
|
Smokeping is a good start if you think there's a problem at the physical layer, but pings will almost never reveal the kinds of problems you expect them to in production. Hitting a single endpoint probably won't reveal anything about asymmetrically misconfigured link aggregates, because you'll always be taking the same network path. Small ping packet sizes won't reveal anything related to mismatched MTUs along the network causing unexpected fragmentation. Systems that don't take notice of out-of-order packets won't see UDP packets randomly going round-robin and arriving in the wrong sequence to a bad application that doesn't cope with that. Certainly, a ping every second or two will not trigger any meddling QoS policies, and won't reveal anything in particular about links that are saturated under production load. As a practical example, here's the kind of dumb bullshit you'll run into in some cloud networks, and no quantity of pings will ever detect it for you: https://code.google.com/p/google-compute-engine/issues/detail?id=87 A better option is to come up with some kind of test suite that's representative of your production workload, start a packet capture (tcpdump ring buffer is an awesome option), run it until you see the issue, then inspect the network traffic. Are you seeing packets randomly arriving out of order at your endpoint? Are you receiving fragmented packets that you expect not to be fragmented? Is there significant latency between certain packets leaving the one system and arriving at the other? Is some traffic just plain missing? The best place to start is to just analyze a basic packet capture and see what Wireshark's UI flags in red. Your local network device error counters (and dmesg) are also your friend. necrobobsledder posted:For some general ideas, Brendan Gregg's book and his website have all sorts of solid methodologies for "figure out wtf is going wrong" and "why is poo poo so slow?" problems. I feel like an old neckbeard, but I've been relying more and more on sar/sysstat recently and less on stuff like collectd and Graphite. It's certainly a lot easier to scale. Vulture Culture fucked around with this message at 05:09 on Dec 7, 2015 |
# ? Dec 7, 2015 05:00 |
|
Yuck. Not only is that an obscure and ugly problem, but the poor handling of it is pretty disheartening. Someone had to publicly shame them before the issue was escalated externally. Reporting to resolution: 6+ months and counting.
|
# ? Dec 7, 2015 05:06 |
|
Thanks for all the input! I'm taking a 3-pronged approach to the problem: 1) Rebuilt some graphite boards to get a better look at network stats across all hosts 2) Smokeping. I have not used this before but I've got another guy familiar so we should be able to roll out quickly. 3) I'm going to write a monitoring agent based on hashicorp memberlist. This will hopefully let me differentiate different types of link failures by acting at the time of detection to verify from multiple hosts. If it works out the way I want, this should be capable of answering my primary question of overall system stability.
|
# ? Dec 7, 2015 20:01 |
|
Vulture Culture posted:I've almost never found time series to be useful for anything in recent memory -- though I once burned 3,000 IOPS and 400 GB of disk space on Graphite chasing down a single NFS performance regression on an IBM software update (have you run across the mountstats collector for Diamond? That was me, for this problem) -- but I agree completely about Brendan Gregg and his USE method, and the need for correlation. At the most basic level, this means system clocks corrected to within a second or two, and reasonable log aggregation to help determine exactly what's going on in a distributed system. The Graphite bits are super-useful if you find yourself looking at actual NIC errors, but if you're divorced from the physical hardware in a private cloud, you'll see dwindling returns. There's no way I'd have found a lot of problems without tcpdump / Wireshark such as bad NAT configurations or low firewalls and traffic shapers gone mad, and anything else in your usual enterprise network of madness. AWS offering Flow Logs would be great if I could get any drat access for these to be able to use the feature instead of having to do crazypants things like e-mailing LEGAL if I can directly dump and share info off a BGP switch with Amazon support. But our cloud maturity level is pretty bad so I suspect we won't find one thing wrong with an AWS service for 400 things that is our fault. Our poor Amazon account rep Collecting what sort of errors start showing up at what times is helpful when you're trying to at least avoid some obvious problems like cron jobs or vMotion and you want to quickly share different error stats at certain points in time with third parties, including your cloud provider.
|
# ? Dec 8, 2015 06:20 |
|
necrobobsledder posted:(load of 45 on an 8 vCPU box in prod is scary, man) e: wait, it was dual quad-core Vulture Culture fucked around with this message at 07:01 on Dec 8, 2015 |
# ? Dec 8, 2015 06:55 |
|
Vulture Culture posted:My personal record is 253 on a quad-core running TSM Every time we get a significant network event that causes cascading failure our 8-core Logstash server gets slammed and hits around 280 as it tries to process the dramatic upswing in error messages.
|
# ? Dec 8, 2015 18:53 |
|
I have a website I host in aws. I have a dns alias record pointing to an ELB, and another ELB on standby. I update the application on one ELB, and then change the dns record from one to the other. This work perfectly in firefox, but chrome doesn't seem to pick up the DNS change, or at least not as fast, in fact it's very slow to pick up the change. I assume this isn't something wrong with the architecture? I assume this is a chrome problem? If so, what is it and how can I remedy this problem? Or is it that I need to put my elbs behind something that never changes IPs? If so, how would I go about doing this easily without changing too much architecture?
|
# ? Dec 22, 2015 00:10 |
|
Use Route 53 for your DNS and use an alias entry?
|
# ? Dec 22, 2015 00:17 |
|
Thanks Ants posted:Use Route 53 for your DNS and use an alias entry? I'm already doing that, that's the alias record I change
|
# ? Dec 22, 2015 00:29 |
|
Chrome maintains its own DNS cache which is why you probably don't see the change picked up instantly. I'm phone posting so this might not be entirely correct but you can clear it at something like chrome://net-internals/#dns in your browser. Through that obviously doesn't help the general public. Hopefully Chrome at least kind of respects TTL. What TTL do you have set on the record? If quick updates are important you want something like 5 minutes.
|
# ? Dec 22, 2015 00:43 |
|
Docjowles posted:Chrome maintains its own DNS cache which is why you probably don't see the change picked up instantly. I'm phone posting so this might not be entirely correct but you can clear it at something like chrome://net-internals/#dns in your browser. Through that obviously doesn't help the general public. I need this to affect the general public. I use only alias records so my TTLs should be pretty much instantaneous, the only record that isn't an alias is the SOA, and that's 10 seconds. So I'm really not sure what's going on. Firefox gets the change almost instantly, chrome is slow, or just doesn't get it, I'm not sure what Chrome is doing. Even when I clear Chrome's DNS it doesn't seem to taking the change, at least consistently.
|
# ? Dec 22, 2015 01:13 |
|
Is there an HTTP header you can send to get Chrome to gently caress off with the caching? Phone posting but this seems to be a Chrome thing and not necessarily something that can be resolved in your DNS setup.
|
# ? Dec 22, 2015 01:36 |
|
Thanks Ants posted:Is there an HTTP header you can send to get Chrome to gently caress off with the caching? Phone posting but this seems to be a Chrome thing and not necessarily something that can be resolved in your DNS setup. I have no idea, that's why I'm asking. It appears that Chrome is caching the DNS, and the content can't change until the DNS updates in chrome. A dig shows the machine is getting the right information, but Chrome is not.
|
# ? Dec 22, 2015 01:45 |
|
Extra-circular question but when it comes to massive web-based SaaS Applications like Facebook, Salesforce, Apple iCloud what are they using for their Directory Service? Active Directory doesn't make sense because it's too slow for such an enormous deployment and being web-centric Kerberos/NTLM aren't a good fit. I know many will point to Azure AD but all of these services existed before AAD. What do they use?
|
# ? Dec 22, 2015 01:59 |
|
Tab8715 posted:Extra-circular question but when it comes to massive web-based SaaS Applications like Facebook, Salesforce, Apple iCloud what are they using for their Directory Service?
|
# ? Dec 22, 2015 03:00 |
|
Does anyone here have any experience with creating internal ec2 build agents? I want to build code for ec2 but do to legal reasons cannot deploy anything but a binary to ec2. This makes compiling against their kernel headers hard as what I'm building is a kernel module. e: this process might work... https://forums.aws.amazon.com/thread.jspa?messageID=498214 Winkle-Daddy fucked around with this message at 23:27 on Dec 22, 2015 |
# ? Dec 22, 2015 23:05 |
|
|
# ? Apr 28, 2024 13:12 |
|
Megaman posted:I use only alias records so my TTLs should be pretty much instantaneous
|
# ? Dec 23, 2015 07:47 |