Register a SA Forums Account here!
JOINING THE SA FORUMS WILL REMOVE THIS BIG AD, THE ANNOYING UNDERLINED ADS, AND STUPID INTERSTITIAL ADS!!!

You can: log in, read the tech support FAQ, or request your lost password. This dumb message (and those ads) will appear on every screen until you register! Get rid of this crap by registering your own SA Forums Account and joining roughly 150,000 Goons, for the one-time price of $9.95! We charge money because it costs us money per month for bills, and since we don't believe in showing ads to our users, we try to make the money back through forum registrations.
 
  • Post
  • Reply
chutwig
May 28, 2001

BURLAP SATCHEL OF CRACKERJACKS

Vulture Culture posted:

We had the problem you're talking about in Kilo and it went away when we identified and went around the MySQL issue where you can't actually do multi-master in Nova with Galera.

We front our MySQL cluster with HAProxy with the MySQL backend in backup mode so that only one instance is interacted with for exactly that reason.

Adbot
ADBOT LOVES YOU

evol262
Nov 30, 2010
#!/usr/bin/perl
I just want to :commissar: whatever genius decided that the message queue should just be used to tell services a new record was written and to check mysql instead of pub/subbing the actual content and writing it to the database as a failsafe.

It's so hosed that a replacement was needed, but trove is just as bad in different ways :yaycloud:

Vulture Culture
Jul 14, 2003

I was never enjoying it. I only eat it for the nutrients.

evol262 posted:

I just want to :commissar: whatever genius decided that the message queue should just be used to tell services a new record was written and to check mysql instead of pub/subbing the actual content and writing it to the database as a failsafe.

It's so hosed that a replacement was needed, but trove is just as bad in different ways :yaycloud:
It was hard enough to get the database and memcache teams to work together, imagine if the messaging layer had to deal with cache coherency with OpenStack politics

ILikeVoltron
May 17, 2003

I <3 spyderbyte!

Vulture Culture posted:

We had the problem you're talking about in Kilo and it went away when we identified and went around the MySQL issue where you can't actually do multi-master in Nova with Galera.

I just got told to edit nova.conf, and add this to all nodes:
code:
[database]
idle_timeout = 300
But after reading what you guys are saying and with a colleague I'm not so sure now.

I just checked the defaults... 1 hour, man this is so janky.
code:
[database]
idle_timeout=3600

ILikeVoltron fucked around with this message at 07:08 on Aug 28, 2015

Vulture Culture
Jul 14, 2003

I was never enjoying it. I only eat it for the nutrients.

ILikeVoltron posted:

I just got told to edit nova.conf, and add this to all nodes:
code:
[database]
idle_timeout = 300
But after reading what you guys are saying and with a colleague I'm not so sure now.
Back up -- what does your database architecture and configuration look like?

ILikeVoltron
May 17, 2003

I <3 spyderbyte!

Vulture Culture posted:

Back up -- what does your database architecture and configuration look like?

It's literally a vanilla install from redhat's OSP6. They deploy a wsep based trifecta for the database, using ha proxy as the front end (with server pinning) and replication. It's running on 3 x 56 core boxes with 128 gigs of memory or round about (all bare metal) and a pair of disks in raid 1. We're talking about 10 VMs total on the system (on say, 28ish nova hosts).

evol262
Nov 30, 2010
#!/usr/bin/perl

ILikeVoltron posted:

It's literally a vanilla install from redhat's OSP6. They deploy a wsep based trifecta for the database, using ha proxy as the front end (with server pinning) and replication. It's running on 3 x 56 core boxes with 128 gigs of memory or round about (all bare metal) and a pair of disks in raid 1. We're talking about 10 VMs total on the system (on say, 28ish nova hosts).

I'm not really an OSP person (upstream only there, both ends on RHEV), but I have an old (Juno) RDO (upstream of RHOS at the time, now RHEL-OSP) running on two dell 9020s running more VMs than that.

Do you have a case #? Can you PM it to me? I can at least go look at the sosreports and bits they've made you attach.

ILikeVoltron
May 17, 2003

I <3 spyderbyte!

evol262 posted:

I'm not really an OSP person (upstream only there, both ends on RHEV), but I have an old (Juno) RDO (upstream of RHOS at the time, now RHEL-OSP) running on two dell 9020s running more VMs than that.

Do you have a case #? Can you PM it to me? I can at least go look at the sosreports and bits they've made you attach.

Sent! Thanks for taking a look at this.

Gothmog1065
May 14, 2009
Are there any good books/guides/online thingies for virtualization? I have a high overview concept of VMs, I finally have a bare metal VM machine with ESXi on it and a couple of basic guest OSes (Windows Vista for loving with those "I work for MS and your computer is infected", and Linux). I'm looking to start working with virtual switches and routers, and some VLANs. I'm wanting to start setting things up, but I'd like to get a better handle on virtualization in general.

Potato Salad
Oct 23, 2014

nobody cares


If you are the type that learns by immersion, go make a Xen box and read about OVS and Openflow.

Even getting a basic network with vlans set up with ovs and open flow is fairly involved and will get you exposed to write a bit quickly.

chutwig
May 28, 2001

BURLAP SATCHEL OF CRACKERJACKS

Have any of you OpenStackers successfully migrated from Nova networking to Neutron? We are still using Nova networking at present but want to begin laying the groundwork for a switch to Neutron, because our tenants are increasingly asking for features that are only in Neutron, we expect to eventually scale our clusters to the point where we will need something like Calico to maintain a sane network topology, and Nova networking just doesn't get any love any more (and why would it?).

I'm pretty new to the OpenStack world, but I've already heard all sorts of horror stories about Neutron. A lot of them are pretty old and I imagine things have improved since then, but I'd like to hear from any ops that are currently using it, particularly if you had a prior nova-network-based setup.

GobiasIndustries
Dec 14, 2007

Lipstick Apathy
I've got two probably dumb questions related to Windows Sever 2012R2 within ESXi (home lab):

1. I want to sync time between ESXi and my two R2 VMs (and other future VMs); I've set ESXi to sync with time.windows.com, and checked the 'Synchronize guest time with host' for both VMs (both have VMware Tools installed). This should take care of it, correct?

2. I realized this morning that both of my hosts had shut down unexpectedly and I don't know when it happened (the time settings were all messed up for both VMs too); how can I set up notifications so I'm informed when one or both hosts shuts down or the entire system reboots for whatever reason?

adorai
Nov 2, 2002

10/27/04 Never forget
Grimey Drawer

GobiasIndustries posted:

I've got two probably dumb questions related to Windows Sever 2012R2 within ESXi (home lab):

1. I want to sync time between ESXi and my two R2 VMs (and other future VMs); I've set ESXi to sync with time.windows.com, and checked the 'Synchronize guest time with host' for both VMs (both have VMware Tools installed). This should take care of it, correct?
This may be a holdover from the binary translation days, but it used to be best practice to actually run NTP on each guest.

Vulture Culture
Jul 14, 2003

I was never enjoying it. I only eat it for the nutrients.

adorai posted:

This may be a holdover from the binary translation days, but it used to be best practice to actually run NTP on each guest.
Nowdays there's a few KB articles on best practice:

Linux: http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=1006427
Windows: http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=1318

Gothmog1065
May 14, 2009

Potato Salad posted:

If you are the type that learns by immersion, go make a Xen box and read about OVS and Openflow.

Even getting a basic network with vlans set up with ovs and open flow is fairly involved and will get you exposed to write a bit quickly.

I'm not really a Linux person. Of what I'm reading, it's very Linux heavy. If that's the best solution I'll learn Linux on top of it as well, unless there's something else.

evol262
Nov 30, 2010
#!/usr/bin/perl
The reference implementation is on Linux, but any switch vendor's openflow/ovs stuff is just as arcane. SDN on Windows is young. Expect to have to learn another OS/paradigm (or more than one) to do anything serious with SDN.

adorai
Nov 2, 2002

10/27/04 Never forget
Grimey Drawer
You aren't likely to need any SDN experience for an enterprise job. Just buy a cheap managed switch and install esxi, you can learn a lot with those two things.

evol262
Nov 30, 2010
#!/usr/bin/perl
May as well repeat the common:

Fanless procurve
Powerconnects if you can deal with fans (or swap them)
GNS3 for playing with complex setups

If you really want virtual routers in a "not emulating Cisco/juniper kit" way, you'll need/want SDN, even if it's just basic vyatta

adorai
Nov 2, 2002

10/27/04 Never forget
Grimey Drawer

evol262 posted:

May as well repeat the common:

Fanless procurve
Powerconnects if you can deal with fans (or swap them)
GNS3 for playing with complex setups

If you really want virtual routers in a "not emulating Cisco/juniper kit" way, you'll need/want SDN, even if it's just basic vyatta
vyatta / vyos isn't really SDN. NFV yes, but SDN no.

ILikeVoltron
May 17, 2003

I <3 spyderbyte!

chutwig posted:

We front our MySQL cluster with HAProxy with the MySQL backend in backup mode so that only one instance is interacted with for exactly that reason.

Do you happen to know how this is configured? Like, how did you tell the other members of the MySQL cluster to be backup mode?

some kinda jackal
Feb 25, 2003

 
 
I don't know if this is a question for here or the enterprise hardware thread but:

I want to deploy an R620 standalone ESXi host for testing. It's got the integrated mirrored SD slots in the back. Each came with a 2GB card. Is that going to be fine for a hypervisor out of the box, or should I look to add larger cards? No vCenter, no extra logging, just straight host for VMs. I might install that standalone WebUI but that's about it.

edit: That is to say, it's got like 2TB datastore, I just want to throw the hypervisor alone on the SD cards.

some kinda jackal fucked around with this message at 03:40 on Sep 3, 2015

chutwig
May 28, 2001

BURLAP SATCHEL OF CRACKERJACKS

ILikeVoltron posted:

Do you happen to know how this is configured? Like, how did you tell the other members of the MySQL cluster to be backup mode?

It's configuration on HAProxy only, MySQL has no awareness of it. "Backup mode" in the context of HAProxy means that HAProxy will only send traffic to a single backend instead of spreading it around the different backends: https://github.com/bloomberg/chef-bcpc/blob/master/cookbooks/bcpc/templates/default/haproxy-head.cfg.erb#L34-L44

Our setup leans heavily on the notion of a master virtual IP that floats around the cluster, i.e., there's one control plane node (head node) that is more equal than all the rest, and that's whichever one is holding the VIP. That node serves MySQL and HAProxy exclusively and also runs cronjobs that should only run on one system, everything else that goes through the VIP is distributed by HAProxy to all the control plane nodes. You'll need something like that for our jankity MySQL+HAProxy workaround to work for you.

theperminator
Sep 16, 2009

by Smythe
Fun Shoe

Martytoof posted:

I don't know if this is a question for here or the enterprise hardware thread but:

I want to deploy an R620 standalone ESXi host for testing. It's got the integrated mirrored SD slots in the back. Each came with a 2GB card. Is that going to be fine for a hypervisor out of the box, or should I look to add larger cards? No vCenter, no extra logging, just straight host for VMs. I might install that standalone WebUI but that's about it.

edit: That is to say, it's got like 2TB datastore, I just want to throw the hypervisor alone on the SD cards.

Should be fine, I was deploying M620's at my last job with ESXI and 1GB Mirrored SDCards with no issues at all. ESXI doesn't make use of the SDCard for anything after booting

Vulture Culture
Jul 14, 2003

I was never enjoying it. I only eat it for the nutrients.

chutwig posted:

It's configuration on HAProxy only, MySQL has no awareness of it. "Backup mode" in the context of HAProxy means that HAProxy will only send traffic to a single backend instead of spreading it around the different backends: https://github.com/bloomberg/chef-bcpc/blob/master/cookbooks/bcpc/templates/default/haproxy-head.cfg.erb#L34-L44

Our setup leans heavily on the notion of a master virtual IP that floats around the cluster, i.e., there's one control plane node (head node) that is more equal than all the rest, and that's whichever one is holding the VIP. That node serves MySQL and HAProxy exclusively and also runs cronjobs that should only run on one system, everything else that goes through the VIP is distributed by HAProxy to all the control plane nodes. You'll need something like that for our jankity MySQL+HAProxy workaround to work for you.
I like how we independently arrived at almost completely identical configurations and the Kilo reference architectures are still completely wrong

Potato Salad
Oct 23, 2014

nobody cares


Martytoof posted:

I don't know if this is a question for here or the enterprise hardware thread but:

I want to deploy an R620 standalone ESXi host for testing. It's got the integrated mirrored SD slots in the back. Each came with a 2GB card. Is that going to be fine for a hypervisor out of the box, or should I look to add larger cards? No vCenter, no extra logging, just straight host for VMs. I might install that standalone WebUI but that's about it.

edit: That is to say, it's got like 2TB datastore, I just want to throw the hypervisor alone on the SD cards.

Esxi will recognize that the memory is SD and avoid putting /scratch on it. It'll ask you where you would like to place that on install. VMware writes that 1GB is possible, but you may not have enough space for a coredump. Four gigs is recommended. Either works fine enough.

After a ton of calls with Dell about this, it turns out a rep cannot add the SD riser to the server without giving you the 2GB sd cards. They can, however, sell you the SD risers in a new order group, downgrade to their cheap as dirt 1GB cards, and you buy 4GB wear-leveled Kingston or SanDisk SD cards yourself on the cheap.

Potato Salad
Oct 23, 2014

nobody cares


I'm on my phone right now, but vmware has a good page on install partition hardware considerations Google finds easily.

ILikeVoltron
May 17, 2003

I <3 spyderbyte!

chutwig posted:

It's configuration on HAProxy only, MySQL has no awareness of it. "Backup mode" in the context of HAProxy means that HAProxy will only send traffic to a single backend instead of spreading it around the different backends: https://github.com/bloomberg/chef-bcpc/blob/master/cookbooks/bcpc/templates/default/haproxy-head.cfg.erb#L34-L44

Our setup leans heavily on the notion of a master virtual IP that floats around the cluster, i.e., there's one control plane node (head node) that is more equal than all the rest, and that's whichever one is holding the VIP. That node serves MySQL and HAProxy exclusively and also runs cronjobs that should only run on one system, everything else that goes through the VIP is distributed by HAProxy to all the control plane nodes. You'll need something like that for our jankity MySQL+HAProxy workaround to work for you.

Ok, so the haproxy config out of redhat's installer is "stick on dst" and "timeout server 90m" like so:

code:
listen galera
  bind x.x.x.x:3306
  mode  tcp
  option  tcplog
  option  httpchk
  option  tcpka
  stick  on dst
  stick-table  type ip size 2
  timeout  client 90m
  timeout  server 90m
  server host1 x.x.x.x:3306  check inter 1s port 9200 on-marked-down shutdown-sessions
  server host2 x.x.x.x:3306  check inter 1s port 9200 on-marked-down shutdown-sessions
  server host3 x.x.x.x:3306  check inter 1s port 9200 on-marked-down shutdown-sessions
I'm more interested in how you're doing replication between the mysql hosts as after some disk tests it seems to me that this matters:

code:
mysql -e "SHOW STATUS LIKE 'wsrep_local_recv_queue_avg';"
+----------------------------+----------+
| Variable_name              | Value    |
+----------------------------+----------+
| wsrep_local_recv_queue_avg | 0.545683 |
+----------------------------+----------+
The server running the mongo primary node for ceilometer seems to spike up to 1.5 or so, and then we start to see some failures. This could be nothing, but one of my colleague thinks we could be seeing IO based stalling causing replication to stall causing things to wait eventually fail.

Vulture Culture
Jul 14, 2003

I was never enjoying it. I only eat it for the nutrients.

ILikeVoltron posted:

Ok, so the haproxy config out of redhat's installer is "stick on dst" and "timeout server 90m" like so:

code:
listen galera
  bind x.x.x.x:3306
  mode  tcp
  option  tcplog
  option  httpchk
  option  tcpka
  stick  on dst
  stick-table  type ip size 2
  timeout  client 90m
  timeout  server 90m
  server host1 x.x.x.x:3306  check inter 1s port 9200 on-marked-down shutdown-sessions
  server host2 x.x.x.x:3306  check inter 1s port 9200 on-marked-down shutdown-sessions
  server host3 x.x.x.x:3306  check inter 1s port 9200 on-marked-down shutdown-sessions
I'm more interested in how you're doing replication between the mysql hosts as after some disk tests it seems to me that this matters:

code:
mysql -e "SHOW STATUS LIKE 'wsrep_local_recv_queue_avg';"
+----------------------------+----------+
| Variable_name              | Value    |
+----------------------------+----------+
| wsrep_local_recv_queue_avg | 0.545683 |
+----------------------------+----------+
The server running the mongo primary node for ceilometer seems to spike up to 1.5 or so, and then we start to see some failures. This could be nothing, but one of my colleague thinks we could be seeing IO based stalling causing replication to stall causing things to wait eventually fail.
Check sar -d on each server and see if the disk service time/utilization corroborate the theory of disk I/O-based hangs. If most of your I/O is coming from Ceilometer and random object updates in Nova, the workload shouldn't be very spiky.

BangersInMyKnickers
Nov 3, 2004

I have a thing for courageous dongles

Any major gotchas on vCenter 6 at this point? I'm bringing the DR site up from active/standby to active/active and the documentation makes it sound like linked-mode configuration is way easier on 6 than 5.5, but after the clusterfuck of the 5.0 release I've been a bit wary of jumping to a new version until its had time to settle.

chutwig
May 28, 2001

BURLAP SATCHEL OF CRACKERJACKS

ILikeVoltron posted:

Ok, so the haproxy config out of redhat's installer is "stick on dst" and "timeout server 90m" like so:

code:
listen galera
  bind x.x.x.x:3306
  mode  tcp
  option  tcplog
  option  httpchk
  option  tcpka
  stick  on dst
  stick-table  type ip size 2
  timeout  client 90m
  timeout  server 90m
  server host1 x.x.x.x:3306  check inter 1s port 9200 on-marked-down shutdown-sessions
  server host2 x.x.x.x:3306  check inter 1s port 9200 on-marked-down shutdown-sessions
  server host3 x.x.x.x:3306  check inter 1s port 9200 on-marked-down shutdown-sessions
I'm more interested in how you're doing replication between the mysql hosts as after some disk tests it seems to me that this matters:

code:
mysql -e "SHOW STATUS LIKE 'wsrep_local_recv_queue_avg';"
+----------------------------+----------+
| Variable_name              | Value    |
+----------------------------+----------+
| wsrep_local_recv_queue_avg | 0.545683 |
+----------------------------+----------+
The server running the mongo primary node for ceilometer seems to spike up to 1.5 or so, and then we start to see some failures. This could be nothing, but one of my colleague thinks we could be seeing IO based stalling causing replication to stall causing things to wait eventually fail.

We don't run Ceilometer any more because it kicked the grap out of our control plane nodes. Vulture Culture's suggestion of sar is good because it'll let you corroborate the stalls to load average or iowait issues going on on the node at that time.

How do you mean how we're doing replication between the MySQL hosts? They all talk to each other directly, it's normal whatever Galera does.

Vulture Culture
Jul 14, 2003

I was never enjoying it. I only eat it for the nutrients.

chutwig posted:

We don't run Ceilometer any more because it kicked the grap out of our control plane nodes.
Do this if you aren't running Heat auto-scaling. We went as far as turning off quotas on nova because they're constantly loving around with table locks and making everything fall over.

some kinda jackal
Feb 25, 2003

 
 
Thanks for the SD card info everyone. Embedded hypervisor is my new favourite thing ever :)

some kinda jackal
Feb 25, 2003

 
 
Uhh, if I vMotion a VM from a host with VFRC to one without, how do I then remove the VFRC requirement?

I migrated a dev DB from my old host to the new 620. The old host had a 120GB SSD that I was using for VFRC and the 620 has nothing. When I try to power on the DB I get an error that the 0 bytes of VFRC available on the host are insufficient to fill the request for 60gb which -- doy -- I have no VFRC.

When I go to edit the VM in the webUI I can't see any option to remove the VFRC reservation, however!

I don't want to migrate back to the old host just to remove VFRC and then migrate back. If I edit the VMX would that be enough?


I would have thought vMotion would have warned me about this. Did I catch an edge case here?


Wait nevermind I'm an idiot; I forgot VFRC is defined under each individual disk. 11pm is probably not the best time to muck around with vSphere. sorry everyone.

some kinda jackal fucked around with this message at 04:12 on Sep 4, 2015

Stealthgerbil
Dec 16, 2004


I have a folder in a datastore running on ESXI5 that I just cant access. I try to browse it through the datastore browser and it just sits there searching forever. I tried to ssh in and enter it and it does the same. Its like just this one VM's folder is corrupt. Any ideas?

Potato Salad
Oct 23, 2014

nobody cares


Anyone else go to LA VMWorld?

Potato Salad
Oct 23, 2014

nobody cares


Stealthgerbil posted:

I have a folder in a datastore running on ESXI5 that I just cant access. I try to browse it through the datastore browser and it just sits there searching forever. I tried to ssh in and enter it and it does the same. Its like just this one VM's folder is corrupt. Any ideas?

Local datastore or on a storage device?

Kachunkachunk
Jun 6, 2011
Also is it 5.5?
You can run VOMA to investigate further. And boot into ESXi 6.0 (for free) to do a repair using VOMA (don't have to install it).
http://kb.vmware.com/kb/2036767

Fake Edit:
The Live CD style method I mentioned is potentially not public... but the above KB does link to some documentation on what switches to run. If you're in a bind, I'll find the steps for you tomorrow.

some kinda jackal
Feb 25, 2003

 
 
Ugh is there any way to programatically rename all the files associated with a VM when you rename the VM itself?

You're supposed to be able to do a storage vmotion to rename the associated files but for that to work you have to move it to a DIFFERENT datastore. If you "vmotion" it to the same datastore it doesn't do any renaming.

I have like 30 VMs I want to change the name scheme of and i know it doesn't actually MATTER what the files are named behind the scenes but I'm anal like that so it looks like I'll be spending the next day removing, renaming, and importing by hand.

adorai
Nov 2, 2002

10/27/04 Never forget
Grimey Drawer

Martytoof posted:

Ugh is there any way to programatically rename all the files associated with a VM when you rename the VM itself?

You're supposed to be able to do a storage vmotion to rename the associated files but for that to work you have to move it to a DIFFERENT datastore. If you "vmotion" it to the same datastore it doesn't do any renaming.

I have like 30 VMs I want to change the name scheme of and i know it doesn't actually MATTER what the files are named behind the scenes but I'm anal like that so it looks like I'll be spending the next day removing, renaming, and importing by hand.
Powershell

Adbot
ADBOT LOVES YOU

Maneki Neko
Oct 27, 2000

Potato Salad posted:

Anyone else go to LA VMWorld?

Is this LA vmug or something? I went to US vmworld this year and the VMware parts of it were garbage, some good sessions and vendor meet ups though.

  • 1
  • 2
  • 3
  • 4
  • 5
  • Post
  • Reply