|
1000101 posted:You probably can't answer this question even if you know the answer but I'll ask anyway. This times 1000. They can not be using OSP unless those guys are popping happy pills daily to make up for the soul crushing weight of having to work through that deployment tool.
|
# ¿ Jul 30, 2015 22:46 |
|
|
# ¿ Apr 27, 2024 22:10 |
|
evol262 posted:I'd guess that they are, though I don't know for sure. Since so few customers who "need" openstack actually need openstack (there's a very high "I need openstack" to "gimme rhev" conversion ratio), it's hard to estimate. Can you explain what needs to happen before it scales past 5 machines? I've got a deployment out there that's around 30 physical nodes and the thing runs like poo. If I launch 10 VMs the APIs start to fail and some of the VMs fail to launch. I've got almost 100% defaults except for password and ceph ports in the OSP hostgroup params.
|
# ¿ Jul 30, 2015 23:04 |
|
evol262 posted:It's hard to say without knowing what's failing. I think it's keystone authentication tokens if I had to guess, the API's that fail are either Cinder related or Nova related. The problem originally seemed to be database backend and Keystone is basically just a REST front end and a database. There's been some errors in the logs but it's mostly "I can't do the thing I tried to do after 3 attempts" google-fu seems to not pull up anything and the OSP config is basically defaults like I stated earlier. There are 5 NICs per host, all running on UCS, dual 10gig backend configured in A-B failover. I never looked into the Neutron worker threads, I'll be sure to check that out. Guest traffic isn't an issue yet, as it's only maybe 5-10 instances. This is a brand new install. And yea, VXLAN for tenant networks. So we're running 3x controllers, each with 128 gigs of ram, 56 cores and a pair of RAID1 7k SAS drives. Glance is mapped to a trio of ceph backed storage nodes, each with 10 disks or 30 total OSDs, I've seen this hit 7000 IOPS and during testing we hardly hit 1000 IOPS (like to reproduce this problem). The NICs are all separated via type, so there's : Management, Storage Clustering, Cluster Management, Tenant, External, Public API, Storage
|
# ¿ Jul 31, 2015 01:51 |
|
Vulture Culture posted:I don't know anything about OSP specifically, but I'm positive you have a database problem. I'll be glad to check this - again this is defaults from the OSP installer. It does have 3x MySQL boxes created using a pacemaker cluster. One guy seems to be the master because his process list hovers between 1000 - 1600 (and one of our first steps was to up this number from the default 1000). The other two MySQLd process lists show 4-5, sleeping or waiting for binlogs iirc. What we see is the 'master' doesn't seem to be reporting any locks directly. The MySQLd log doesn't look too ugly other than reporting it can't change the max number of open files: "[Warning] Could not increase number of max_open_files to more than 1024 (request: 1835)". I think at one point we turned on slow log and found it very weird. Like it would go from 2-4 second queries straight into a 30 second query and then roll over (I think that's the HAProxy timeout for API requests) evol262 posted:This is 5-10 instances total? I thought you mean "starting 10 instances within 3 seconds makes some API fall over", which is often able to be blamed on Neutron. So I can reproduce this easily when I start 10 instances from the cli or gui. Just select CentOS-whatever, launch 10 small's with Cinder backed storage and boom, 9/10 times at least 2-3 error out and fail. I can usually get it to fail just doing straight nova backed instances as well. In terms of breaking the problem down a little more, I'm able to do this with two controllers running in the same cluster (I've taken one down, and changed which one I take down). For which services fail, sometimes it's during Device block mapping, sometimes not. Also, just for clarity. When I ask about "scaling past 5 machines" I mean hosts. Like, a really basic single controller install with 2-3 compute hosts. Also, thank you both for raising some great questions.
|
# ¿ Jul 31, 2015 08:12 |
|
Vulture Culture posted:I don't know anything about OSP specifically, but I'm positive you have a database problem. So I was requested by support to run a innodb_status during or after the failures. It was 3011 lines of output. Most of which look like the following: "MySQL thread id 981139, OS thread handle 0x7ef9eeefc700, query id 35323377 192.168.x.x keystone sleeping ---TRANSACTION 2241C14, not started" Sometimes it's nova, sometimes its keystone, sometimes its neutron.
|
# ¿ Aug 1, 2015 00:00 |
|
evol262 posted:... Either of you guys know of a oslo.db bug in juno with idle timeouts? My redhat support dude is still chasing after engineers trying to understand what's happening. I can not for the life of me understand why this is a problem on a vanilla deployment. We're 5 weeks in at this point.
|
# ¿ Aug 27, 2015 23:51 |
|
Vulture Culture posted:We had the problem you're talking about in Kilo and it went away when we identified and went around the MySQL issue where you can't actually do multi-master in Nova with Galera. I just got told to edit nova.conf, and add this to all nodes: code:
I just checked the defaults... 1 hour, man this is so janky. code:
ILikeVoltron fucked around with this message at 07:08 on Aug 28, 2015 |
# ¿ Aug 28, 2015 07:00 |
|
Vulture Culture posted:Back up -- what does your database architecture and configuration look like? It's literally a vanilla install from redhat's OSP6. They deploy a wsep based trifecta for the database, using ha proxy as the front end (with server pinning) and replication. It's running on 3 x 56 core boxes with 128 gigs of memory or round about (all bare metal) and a pair of disks in raid 1. We're talking about 10 VMs total on the system (on say, 28ish nova hosts).
|
# ¿ Aug 28, 2015 07:11 |
|
evol262 posted:I'm not really an OSP person (upstream only there, both ends on RHEV), but I have an old (Juno) RDO (upstream of RHOS at the time, now RHEL-OSP) running on two dell 9020s running more VMs than that. Sent! Thanks for taking a look at this.
|
# ¿ Aug 28, 2015 18:32 |
|
chutwig posted:We front our MySQL cluster with HAProxy with the MySQL backend in backup mode so that only one instance is interacted with for exactly that reason. Do you happen to know how this is configured? Like, how did you tell the other members of the MySQL cluster to be backup mode?
|
# ¿ Sep 3, 2015 00:29 |
|
chutwig posted:It's configuration on HAProxy only, MySQL has no awareness of it. "Backup mode" in the context of HAProxy means that HAProxy will only send traffic to a single backend instead of spreading it around the different backends: https://github.com/bloomberg/chef-bcpc/blob/master/cookbooks/bcpc/templates/default/haproxy-head.cfg.erb#L34-L44 Ok, so the haproxy config out of redhat's installer is "stick on dst" and "timeout server 90m" like so: code:
code:
|
# ¿ Sep 3, 2015 18:11 |
|
BangersInMyKnickers posted:The function of these virtual arrays really isn't that different from conventional controllers, the scale for everything (cache side, cache persistence time) is just orders of magnitude larger. I am well aware of the implications of this (almost no read ops hitting platter, extreme optimization of writes by buffering through ssd first) and am a proponent of the tech, but the point remains that 6 spindles offers a tiny amount of backend disk performance for a virtual cluster that could easily support 200+ VMs. The danger with these systems is that they will work extremely well, until they don't. And you hit that wall really really hard because you're running at SSD speed right up until the SSD cache is exhausted and by that time the platters are already saturated. More conventional storage architectures give you more warning on when you are hitting that saturation point as storage latencies creep up in a more controlled manner. I'm not saying its bad technology or you shouldn't use it, but you need to know what kind of IO you're throwing at it because there are a number of scenarios when you can make it catastrophically explode in your face, and in a much worse manner than the conventional "bunch o spindles" setups. Isn't the glory of that platform that you could just buy an additional host and no longer be at that wall?
|
# ¿ Jan 29, 2016 23:12 |
|
evol262 posted:I unironically use manageiq to keep track of VMs across RHEV, vSphere, openstack, and aws. Heathen! Heretic! My god why would you do that to yourself.... the horror, oh loving horror! I did a week of training on cloudforms and do not have many nice things to say, other than it can do a whole bunch of stuff in a very complicated way!
|
# ¿ Mar 4, 2016 18:08 |
|
Thermopyle posted:I'd like to automate the creation of VM's in Vagrant style. These would be Linux machines with a desktop environment. Probably Ubuntu 16.04. Would need to install various software and config various things in the VM. What exactly are you trying to automate? Creating the VM? Installing the software? Building images to launch through vagrant? Each of these are different answers. If you're just trying to have a vm you run on your desktop that you can delete easily or recreate easily vagrant is a great solution. "vagrant up; vagrant -f destroy". If you're trying to automate software install, then you want something like puppet/chef/salt/etc, and if you're trying to create the image that you launch out of vagrant (or many other platforms) you want packer.
|
# ¿ Sep 3, 2016 02:11 |
|
jaegerx posted:terraform or packer I legit don't understand what you're asking or answering here? Thermopyle posted:Creating the VM, installing the software. Likely vagrant and some form of provisioning, https://www.vagrantup.com/docs/provisioning/ I'd personally use puppet because it's what I know but if you're weak on that (or chef, ansible, salt, etc) you could just write some shell scripts that'll kick off when the vagrant vm is launched. (ie: yum install foo / apt-get install foo)
|
# ¿ Sep 6, 2016 16:51 |
|
Bhodi posted:I guess I consider container images to be similar enough to binary artifacts to lump them together. I think we're splitting hairs at this point. You're right on opinionated, though! No splitting hairs, you're just wrong. Containers != VMs. VM Images != Containers in any way.
|
# ¿ Sep 6, 2016 20:20 |
|
The Nards Pan posted:I'm having a strange problem with my home qemu/KVM lab running on Ubuntu desktop 17.04 (although this issue was around in 16.04 and 16.10 too). Since I'm using my laptop as the host and it's often connected my wifi which doesn't support bridged connections, I have it set up with a NAT virtual network which also provides DHCP for the guests. If I let my guests use the DNS address that the DHCP provides, which is the network address of the NAT network (192.168.100.1), some websites won't connect while others work fine. I get a response from nslookup for the sites that don't work, but they immediately return a server not found page if I try to access them through a web browser from the guest. If I manually set the DNS to Google's everything works just fine. I can't seem to find a pattern of what works and what doesn't either - microsoft.com and mozilla.com are no good, google.com and somethingawful.com work fine. I get the same results from nslookup using 192.168.100.1 or 8.8.8.8 as DNS on my guest and the same result from nslookup on the host using my ISP DNS. Are you trying to NAT FROM and TO the same network? The NAT Virtual network you're using uses the same IP scope as the "main" or primary network?
|
# ¿ Jan 19, 2017 21:33 |
|
Paul MaudDib posted:So I remember reading that Docker had a really ludicrous security model (running applications runs as root by default and/or needs to be run as root a lot of the time, or something like that). rkt claims to be a 'very secure way to run containers' and can import docker containers for portability. I haven't played with it much and the vagrant VM bombed on me last time I tried, but given another year I think it might overtake docker.
|
# ¿ Apr 26, 2017 00:36 |
|
Roargasm posted:why this over puppet? It's not adverse to puppet, it could easily be in addition to puppet. Packer has a puppet provider built in. Anyway, to answer the question you maybe were asking, packer is good at one thing - building images. Puppet isn't so good at building images, it's good at setting desired state. Which means you're still making some glue around puppet (like say jenkins, or whatever) to launch the VM, run puppet, and then make an image. Packer can do all of this, it's the obvious tool for making images (hence, packing them!)
|
# ¿ May 26, 2017 21:45 |
|
I'm going to throw out there that if you're looking for a 10+ Core Xeon, don't make the mistake I did and order it new, that poo poo is hella cheap on ebay if you can deal with shipping from HK or China. Intel Xeon E5 2630 V4 ES QHVK 2.1Ghz 25MB 10Core 20threads 14nm 85W CPU - $189 on ebay.
|
# ¿ Aug 6, 2017 21:25 |
|
nicky_glasses posted:Anyone doing any vmware automation not in Powershell? The vmware SDK documentation is painfully obtuse and picking any bindings outside of PS is very difficult as there are no books or tutorials as far as I can tell for Python or even others, except for Java which I don't care to learn. Their APIs and docs are mostly poo poo and remind me of working with Active Directory over COM.
|
# ¿ Aug 16, 2017 03:58 |
|
evol262 posted:virt-manager is essentially a frontend to libvirt (like virsh). It's nearly impossible to stress how painful the command line options to qemu are, just open up a system running a VM that was started by qemu and `ps aux | grep qemu`, that running process's flags should all be visable (and painful)
|
# ¿ Mar 2, 2018 00:06 |
|
|
# ¿ Apr 27, 2024 22:10 |
|
SlowBloke posted:I never considered FC to be interesting until i started working with it. Unlike iSCSI it either works perfectly or everything is hosed. The native multipathing is a nice extra. Perfect linear link aggregation is really nice too, the idea that to add bandwidth just add a port is awesome.
|
# ¿ Jul 19, 2019 17:41 |