Register a SA Forums Account here!
JOINING THE SA FORUMS WILL REMOVE THIS BIG AD, THE ANNOYING UNDERLINED ADS, AND STUPID INTERSTITIAL ADS!!!

You can: log in, read the tech support FAQ, or request your lost password. This dumb message (and those ads) will appear on every screen until you register! Get rid of this crap by registering your own SA Forums Account and joining roughly 150,000 Goons, for the one-time price of $9.95! We charge money because it costs us money per month for bills, and since we don't believe in showing ads to our users, we try to make the money back through forum registrations.
 
  • Post
  • Reply
r u ready to WALK
Sep 29, 2001

Storage arrays singing actually sound more like this

Adbot
ADBOT LOVES YOU

Intraveinous
Oct 2, 2001

Legion of Rainy-Day Buddhists

evil_bunnY posted:

Cause of death: random seek to the back of the head.

Now that's a head crash!

*crickets*

error1 posted:

Storage arrays singing actually sound more like this

I got to hear that sound last week for similar reasons (replacing a UPS), though not quite on that scale, since my EVA is only 1 rack.

Intraveinous fucked around with this message at 21:34 on Mar 5, 2012

some kinda jackal
Feb 25, 2003

 
 
I don't know if this is enterprise enough a question, but this seems like the crowd that would be most likely to know the answer:

In a Linux Software-RAID RAID10 array, is there any way to tell which two drives are part of one striped set and which two drives are part of the second striped set that is mirroring the first? I understand that's the philosophy behind RAID10, two striped sets that mirror each other. Does this philosophy not apply to Linux Software-RAID?

I only ask because I had one disk fail in a RAID10 setup and I'd like to know what's happening behind the scenes.

mdadm --detail /dev/md0

pre:
/dev/md0:
        Version : 1.2
  Creation Time : Mon Mar  5 13:44:06 2012
     Raid Level : raid10
     Array Size : 3907021632 (3726.03 GiB 4000.79 GB)
  Used Dev Size : 1953510816 (1863.01 GiB 2000.40 GB)
   Raid Devices : 4
  Total Devices : 3
    Persistence : Superblock is persistent

    Update Time : Mon Mar  5 23:43:22 2012
          State : clean, degraded 
 Active Devices : 3
Working Devices : 3
 Failed Devices : 0
  Spare Devices : 0

         Layout : near=2
     Chunk Size : 32K

           Name : [myhost]
           UUID : fb317521:9023d975:d4559f15:6ac2341a
         Events : 47

    Number   Major   Minor   RaidDevice State
       0       8       17        0      active sync   /dev/sdb1
       1       8       33        1      active sync   /dev/sdc1
       2       0        0        2      removed
       3       8       65        3      active sync   /dev/sde1

and /proc/mdstat is:

pre:
Personalities : [raid10] 
md0 : active raid10 sde1[3] sdc1[1] sdb1[0]
      3907021632 blocks super 1.2 32K chunks 2 near-copies [4/3] [UU_U]
      
unused devices: <none>
So basically I'd like to know if /dev/sdd1 was striped with sde1, which in turn mirrored sdb1/sdc1, or is there some other RAID paradigm here I'm not understanding?

I'm willing to concede that I understand very little about Linux Software-RAID, and I was hoping to learn a little more on the fly, but my disk basically failed a few hours after I took it out of the wrapping and popped it into a tray.

And I guess my second question is what sort of penalty I'm paying here while the system is degraded? I don't see a significant slowdown but there's nothing that's heavily IO-bound running on this machine. I'd love to know what the system is doing while it's in degraded mode. Is it just writing to one striped set and the second set is going un-mirrored?

Hopefully that doesn't sound TOO ridiculous a question. Anything you guys can shed light on would be most appreciated.

Pile Of Garbage
May 28, 2007



I'm not too familiar with software RAID on Linux either however I've just had a look at the man page for mdadm and you can use the --examine switch to get details on components in the array so what output do you get when you run the following:
pre:
mdadm --examine /dev/sdb1
mdadm --examine /dev/sdc1
msadm --examine /dev/sde1

GMontag
Dec 20, 2011

Martytoof posted:

I don't know if this is enterprise enough a question, but this seems like the crowd that would be most likely to know the answer:

In a Linux Software-RAID RAID10 array, is there any way to tell which two drives are part of one striped set and which two drives are part of the second striped set that is mirroring the first? I understand that's the philosophy behind RAID10, two striped sets that mirror each other. Does this philosophy not apply to Linux Software-RAID?

You have it backwards. RAID 10 is x number of mirror sets striped across, not x number of stripe sets mirrored. This is a significant difference, because it means that (for a 4 disk array) there are 4 combinations of two drives failing without loss of the array, rather than just 2.

I'm not familiar with linux software RAID, so I couldn't tell you if there's a software way to tell which drive belongs to which mirror set, but there's always the empirical method. Set up a test array and try pulling out the different possible combinations of two drives. If the array fails, then both drives belonged to the same mirror set.

GMontag fucked around with this message at 12:14 on Mar 6, 2012

HalloKitty
Sep 30, 2005

Adjust the bass and let the Alpine blast
It may have been talked about in the many pages in this thread that I haven't witnessed, but anyone got any quick advice:

A machine with two major datasets (used purely for storage):

Now, would it be faster to have two 4 drive RAIDZ(5) arrays, or just put all the drives (8) in RAIDZ2(6).
Obviously RAIDZ2(6) would give better fault tolerance, but it's all going to be backed up so that's not super crucial.

Just in case anyone has had the same dilemma, or maybe this is indeed a stupid question with an obvious answer, which is fine too.

HalloKitty fucked around with this message at 13:10 on Mar 6, 2012

some kinda jackal
Feb 25, 2003

 
 

cheese-cube posted:

I'm not too familiar with software RAID on Linux either however I've just had a look at the man page for mdadm and you can use the --examine switch to get details on components in the array so what output do you get when you run the following:
pre:
mdadm --examine /dev/sdb1
mdadm --examine /dev/sdc1
msadm --examine /dev/sde1

Good call, I've --examine'd the array container itself but I didn't think to run it against individual members:

pre:
/dev/sdb1:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x0
     Array UUID : fb317521:9023d975:d4559f15:6ac2341a
           Name : [myhost]:0  (local to host [myhost])
  Creation Time : Mon Mar  5 13:44:06 2012
     Raid Level : raid10
   Raid Devices : 4

 Avail Dev Size : 3907021954 (1863.01 GiB 2000.40 GB)
     Array Size : 7814043264 (3726.03 GiB 4000.79 GB)
  Used Dev Size : 3907021632 (1863.01 GiB 2000.40 GB)
    Data Offset : 2048 sectors
   Super Offset : 8 sectors
          State : clean
    Device UUID : 2e178adc:063fa149:18eb94a5:b65f3740

    Update Time : Tue Mar  6 08:00:47 2012
       Checksum : 723e41a6 - correct
         Events : 63

         Layout : near=2
     Chunk Size : 32K

   Device Role : Active device 0
   Array State : AA.A ('A' == active, '.' == missing)
So still not what I thought I'd get, but there might just not be a way to get down to that level for one reason or another v:)v


GMontag posted:

You have it backwards. RAID 10 is x number of mirror sets striped across, not x number of stripe sets mirrored. This is a significant difference, because it means that (for a 4 disk array) there are 4 combinations of two drives failing without loss of the array, rather than just 2.

Oh wow, I completely misread the fundamentals of RAID10 :doh: -- it makes much more sense this way, and it's much more robust than I thought.

So in my case (stripe is intact, but mirror x is missing one member) am I reading it right to say that there really ought to not be any visible performance penalty to the system?

Now please don't fail, x2, until I can get your replacement in :ohdear: . I put the other disks through their paces with the RAID resync last night, but since all the disks are from the same batch I'm thinking about going out to buy a separate 2TB disk in addition to replacing this one just to act as a hotspare.


e: As far as which disk does what in the array, I guess I'm not terribly concerned about that, as long as the system knows what it's doing. It would be NICE to know, and logically I suppose I could guess that the first two disks are the first mirror, the second two are the second mirror and I'm striping across {sdb,sdc}-{sdd/sde}, but I guess I'll just do some more googling.

Thanks for the advice everyone, and thanks for setting me straight on RAID10. Words can't express how dumb I feel that I missed that small detail.


edit: http://en.wikipedia.org/wiki/Non-standard_RAID_levels has some interesting details on the layout method of how Linux runs RAID10 which (in my mind) explains why there is no set list of exactly what is mirroring what and striping across what.

some kinda jackal fucked around with this message at 14:30 on Mar 6, 2012

r u ready to WALK
Sep 29, 2001

Martytoof posted:

So in my case (stripe is intact, but mirror x is missing one member) am I reading it right to say that there really ought to not be any visible performance penalty to the system?

It depends on how good linux software raid is at load balancing across mirrors. Raid-1 is usually very fast for reads since you double the amount of spindles and read heads, meaning that if you're reading with two threads or more they can use one disk each and send every other read request to different mirrors.

The downside is that writes are slower since you have to update two copies of the data, and the drives usually spin independently so there is additional write latency as you wait for two spindles to rotate around to the right spot.

I suspect that degraded raid10 arrays have quite worse read performance but actually slightly better write performance as a result.

some kinda jackal
Feb 25, 2003

 
 
I hadn't considered multithreaded reading. There would definitely be a penalty if that's the case, but I can't empirically say whether or not that's what happens.

Again, thanks for all the advice everyone. At least now that this thing is done with I get to go play with my ZFS iSCSI whitebox again :3:

FISHMANPET
Mar 3, 2007

Sweet 'N Sour
Can't
Melt
Steel Beams

HalloKitty posted:

It may have been talked about in the many pages in this thread that I haven't witnessed, but anyone got any quick advice:

A machine with two major datasets (used purely for storage):

Now, would it be faster to have two 4 drive RAIDZ(5) arrays, or just put all the drives (8) in RAIDZ2(6).
Obviously RAIDZ2(6) would give better fault tolerance, but it's all going to be backed up so that's not super crucial.

Just in case anyone has had the same dilemma, or maybe this is indeed a stupid question with an obvious answer, which is fine too.

Two vdevs (each RAIDZ set is a vdev) are faster than one vdev, even when it's the same pool. The way RAIDZ works, is that each vdev is as fast as its slowest member, because for every read it has to read off of every drive in the vdev. When you add two vdevs the data is striped across both (essential a RAID 0) so it can read twice as fast.

The only thing you lose is data integrity. With one RAIDZ2 you can lose any two disks and still be fine, with two RAIDZs you can lose two disks, but if they're from the same vdev you lose everything (remember, it's a stripe).

So, depending on how good your backups are and how bad downtime would be, I would say make on pool with two vdevs.

luminalflux
May 27, 2005



This might be more of a networking question... I'm seeing a huge amount of dropped packets on the switch access ports my P4000's are connected to. The autoneg and speed are set correctly from what I can see. Flow control is enabled on those ports, but not on the trunk link between the two switches they're connected to.

Will bad things happen if I turn flow-control on the trunk? Will this alleviate my dropped packets from the SAN?

Internet Explorer
Jun 1, 2005





I'm a terrible networking guy, but flowcontrol shouldn't be causing that. Turning it off shouldn't cause any problems as far as I know. Have you tried forcing the ports to gigabit, one at a time so you don't interrupt service? Or replacing the cat6 cables? Do you see any errors on those ports on the switch? Or maybe offline one port, turn it back on, rinse and repeat?

luminalflux
May 27, 2005



I'm seeing errors on the switch, they're showing up in the Tx Err column. Not sure if I can see errors on the P4000 easily (unless it's well hidden in CMC). Setting flow-control on the switch trunk didn't help. The ports are configured Auto on the switch and 1000/Auto on the SAN. Might try forcing them to fixed 1000 to see what happens.

Nomex
Jul 17, 2002

Flame retarded.

luminalflux posted:

This might be more of a networking question... I'm seeing a huge amount of dropped packets on the switch access ports my P4000's are connected to. The autoneg and speed are set correctly from what I can see. Flow control is enabled on those ports, but not on the trunk link between the two switches they're connected to.

Will bad things happen if I turn flow-control on the trunk? Will this alleviate my dropped packets from the SAN?

If the ports are trunked, make sure the switch and array are both configured correctly for the trunk. We had a ton of dropped packets on one array because the port channel on one of our switches wasn't configured and it caused the port to flap.

luminalflux
May 27, 2005



I had a brain fart, why did I write trunk? I meant the 10G link between the 2 switches (all ESXi hosts and P4000 nodes are connected to both switches for redundancy).

As for trunking/bonding, we used to use LACP for trunking the iSCSI ports between the switch and the P4000s (and let VMware handle NIC teaming on it's own), but that didn't sit well with them. We switched to using ALB which was a lot smoother, and then moved from having all iSCSI on one switch to split over two switches connected by a 10G link.

Intraveinous
Oct 2, 2001

Legion of Rainy-Day Buddhists
Has anyone had any experience with using consumer SSDs in servers? Things started moving that way for me when HP was putting a 6 week hold on any server order with hard drives late last year and early this year due to the nastiness in Thailand. At first, I just ordered them with no HDD and booted from SAN, but for some things, it makes more sense to have DAS for your boot volume at least. I ordered a few Intel 520 series earlier this week to do some testing. I like the fact that Intel puts write statistics into their SMART status, so I should be able to keep an eye on write amplification and NAND wear.

Other than this, my only SSDs in the data center are for a small but read heavy Oracle DB server. I put an array of 4 60GB Samsung SLC (HP OEM) drives in that, but they cost me like $800/each. I'm thinking that running an MLC drive that allows watching the statistics, and costs 75% less, would be a better use of the money. Even if I have to replace the consumer drives every year or two, I'd still be way ahead money wise vs the SLC or eMLC drives.

Internet Explorer
Jun 1, 2005





As long as it's not in a critical server and won't cause any huge headache when it goes down, sure.

Intraveinous
Oct 2, 2001

Legion of Rainy-Day Buddhists
I figured by doing a RAID1 and replacing them proactively as they get nearer to used up, I shouldn't have much downtime to worry about. I was thinking that at 50% life or so, replace it with a new drive, and put it in one of the Admin laptops or something.

evil_bunnY
Apr 2, 2003

Internet Explorer posted:

As long as it's not in a critical server and won't cause any huge headache when it goes down, sure.
Basically this. At a past customer we had a white-box SSD iSCSI box for continuous integration, and at my current employer I'm going to have another one for testing, because dcpromos taking 10mn is something I'd rather not deal with.

KS
Jun 10, 2003
Outrageous Lumpwad
Just curious, do they fit in the 2.5" bays on DL-series servers?

bort
Mar 13, 2003

luminalflux posted:

I'm seeing errors on the switch, they're showing up in the Tx Err column. Not sure if I can see errors on the P4000 easily (unless it's well hidden in CMC). Setting flow-control on the switch trunk didn't help. The ports are configured Auto on the switch and 1000/Auto on the SAN. Might try forcing them to fixed 1000 to see what happens.
Assuming these are copper links, 1000Base-T is supposed to autonegotiate and not be set statically. It also uses all four pairs in the cable, not just two like earlier ethernet standards. If you're sure those counters are actually incrementing, I'd first suspect a cable. If you were having trouble with port aggregation and that's now fixed, those errors might be old and your counters might not be increasing.

e: \/ \/ I only bring it up because "set it to 1000/full statically!" seems to be a troubleshooting step that either doesn't work or makes things even worse, at least in my experience.

bort fucked around with this message at 13:14 on Mar 9, 2012

luminalflux
May 27, 2005



bort posted:

Assuming these are copper links, 1000Base-T is supposed to autonegotiate and not be set statically. It also uses all four pairs in the cable, not just two like earlier ethernet standards. If you're sure those counters are actually incrementing, I'd first suspect a cable. If you were having trouble with port aggregation and that's now fixed, those errors might be old and your counters might not be increasing.

I'm well aware of autoneg and gigabit, previous gig was writing control software for ethernet network elements. What autoneg should mean vs what manufacturers think it could mean are a lot different. Counters are def increasing (I've got Munin tracking them), they took off a few weeks ago about when I upgraded SAN/iQ. Not sure if it's cable related, got good quality (hopefully) Cat6 patch cables and the cables have stayed the same throughout (and I haven't touched the hardware for a few months)

Intraveinous
Oct 2, 2001

Legion of Rainy-Day Buddhists

KS posted:

Just curious, do they fit in the 2.5" bays on DL-series servers?

The standard 2.5" drive for a server has a 15mm Z, most consumer level SSDs are either 7mm or 9.5mm. If you're screwing into it from the bottom, you'll probably need to shim it up so the SATA connecter is in the right place, but if you're screwing it in from the sides, things *should* line up.

The Intel 520s are mostly 7mm drives with a 2.5mm plastic shim attached, as far as I can tell. I ordered some of them earlier this week and they'll be going in a DL360, so I'll let you know my experience when they arrive.

Wonder_Bread
Dec 21, 2006
Fresh Baked Goodness!
Anyone have any suggestions for an affordable (preferably sub-$3k without drives) 6+ bay rackmount NAS that supports NFS?

Running PHDvirtual backups in my environment but the new version of the software apparently hates the CIFS share I am forced to use with our existing Drobo, and I need to replace it with something that natively supports NFS... and as cheaply as possible. Looked at a Drobo B800i but I'd rather not use iSCSI if possible.

Internet Explorer
Jun 1, 2005





Our Synology DS1010+ has NFS. Haven't used the NFS features on it but we've been pretty happy with it overall. Ours isn't rack mounted but I think they sell rack mountable models.

Wonder_Bread
Dec 21, 2006
Fresh Baked Goodness!
I was looking at the RS2211+ model, it seems like it might fit the bill.

Has anyone here used the NFS features of a Synology NAS?

bort
Mar 13, 2003

Wonder_Bread posted:

Has anyone here used the NFS features of a Synology NAS?
Only at home, but it's pretty impressively simple. Any folder you designate as a share can be shared over CIFS/NFS and whatever Apple uses. I set up a bunch of iSCSI LUNs before discovering that and then wished I'd used NFS but :effort:. Their newest software version has some cool new features, too, but I've only marveled at the look and feel.

HalloKitty
Sep 30, 2005

Adjust the bass and let the Alpine blast

FISHMANPET posted:

Two vdevs (each RAIDZ set is a vdev) are faster than one vdev, even when it's the same pool. The way RAIDZ works, is that each vdev is as fast as its slowest member, because for every read it has to read off of every drive in the vdev. When you add two vdevs the data is striped across both (essential a RAID 0) so it can read twice as fast.

The only thing you lose is data integrity. With one RAIDZ2 you can lose any two disks and still be fine, with two RAIDZs you can lose two disks, but if they're from the same vdev you lose everything (remember, it's a stripe).

So, depending on how good your backups are and how bad downtime would be, I would say make on pool with two vdevs.

Thanks!

FlyingZygote
Oct 18, 2004
I'm using VMware's I/O Analyzer virt appliance (http://labs.vmware.com/flings/io-analyzer) do to some benchmarking on our new 12 disk NetApp FAS 2040.

When I run Iometer on the datastore from two hosts, the total performance is less than when running Iometer from one host. Going in, I expected that total from two hosts would be higher than that of one host, but it's around 16% less for all tests.

Here's something to illustrate my results:
code:
Hosts			IOPS	MBPS
1 host			3500	100
Total from 2 hosts	2900	84
I'm using the Iometer configuration files from this thread: http://communities.vmware.com/thread/197844?start=0&tstart=0

Are the results I'm getting to be expected? If not, what could be the problem?

evil_bunnY
Apr 2, 2003

Heads have to seek. Run your streams to 2 different disk aggregates if you want your numbers to go up.

FlyingZygote
Oct 18, 2004
So... what I received is to be expected?

Internet Explorer
Jun 1, 2005





FlyingZygote posted:

So... what I received is to be expected?

What type of disks? It actually sounds very high to me, even for 15k SAS drives. I imagine you're hitting cache somehow.

http://en.wikipedia.org/wiki/IOPS

    7,200 rpm SATA drives HDD ~75-100 IOPS[2] SATA 3 Gb/s
    10,000 rpm SATA drives HDD ~125-150 IOPS[2] SATA 3 Gb/s
    10,000 rpm SAS drives HDD ~140 IOPS [2] SAS
    15,000 rpm SAS drives HDD ~175-210 IOPS [2] SAS

210 x 12 = 2520

FlyingZygote
Oct 18, 2004
The numbers were fudged a bit to make them easier to look at.

What I'm actually getting for MaxThroughput-100%Read:
code:
Hosts			IOPS	MBPS
1 host			3479	108
Total from 2 hosts	2833	88
You might be right about the cache. The datastore I'm hitting is NFS configured for thin provision. I should probably setup a datastore that is not thin provisioned/deduped.

madsushi
Apr 19, 2009

Baller.
#essereFerrari

FlyingZygote posted:

The numbers were fudged a bit to make them easier to look at.

What I'm actually getting for MaxThroughput-100%Read:
code:
Hosts			IOPS	MBPS
1 host			3479	108
Total from 2 hosts	2833	88
You might be right about the cache. The datastore I'm hitting is NFS configured for thin provision. I should probably setup a datastore that is not thin provisioned/deduped.

But but but why would you run the test again if you're seeing good performance, regardless of hitting the cache? Also typically iometer uses random data so it's not in the cache.

Muslim Wookie
Jul 6, 2005
Thin provisioning and dedupe are not performance hinderances on NetApp filers.

Also, use sio from now.netapp.com, it's in the utility toolchest. This will give you the most accurate IO readings, across CIFS or NFS or iSCSI or FC etc. Those numbers you've got seem high and lead me to distrust the VMware tool you are using.

FlyingZygote
Oct 18, 2004
NFS is thin by default (link). Since the numbers do seem high, I'll start some new benchmarks with sio (link).

Thanks!

YOLOsubmarine
Oct 19, 2004

When asked which Pokemon he evolved into, Kamara pauses.

"Motherfucking, what's that big dragon shit? That orange motherfucker. Charizard."

madsushi posted:

But but but why would you run the test again if you're seeing good performance, regardless of hitting the cache? Also typically iometer uses random data so it's not in the cache.

Based on what he posted it looks like this was on the Max-Throughput, 100% read test which would make it sequential reads, which explains the high IOPS number since he's a) getting data from cache and b) only performing a minimal number of disk seeks, so the high seek latency on 7.2K disk is minimized. 3500 IOPs

To the poster's original question regarding performance on his 2040, he should look at his the "sysstat -x" output on the filer while running his benchmarks if he wants to see his cache hit rate and how many high his actual disk utilization is.

ZombieReagan posted:

A FAS2040 isn't going to be up to handling multiple VMs pounding the poo poo out of it with a ton of random IO and not suffer some performance degradation. A FAS3210 wouldn't have been that much more expensive, and you have the option of getting PAM cards as well. Even if you didn't need them today, it's nice to have some cards to play that don't involve buying another controller when you're up against the wall, especially since adding more disks to a raid group or aggregate won't improve read performance until WAFL has time to balance things back out.

PAM isn't supported on the 3210 on 8.1 and I'm not sure it ever will be, so buying 3210s with PAM is a bad idea for future supportability. I don't think sales is even supposed to provide the option.

YOLOsubmarine fucked around with this message at 04:51 on Mar 14, 2012

Muslim Wookie
Jul 6, 2005

NippleFloss posted:

PAM isn't supported on the 3210 on 8.1 and I'm not sure it ever will be, so buying 3210s with PAM is a bad idea for future supportability. I don't think sales is even supposed to provide the option.

Yeah, this PAM and 3210 issue is annoying, there are many deployed before this configuration was disallowed.

Just to be clear, PAM isn't supported on 3210's at all at the moment...

evil_bunnY
Apr 2, 2003

marketingman posted:

Yeah, this PAM and 3210 issue is annoying, there are many deployed before this configuration was disallowed.
Why was it disallowed? I got quoted one just a month ago, and a the netapp website still lists a pair of 512GB cards as supported config, with the little gotcha of not being able to run 8.1

evil_bunnY fucked around with this message at 09:28 on Mar 14, 2012

Adbot
ADBOT LOVES YOU

Alctel
Jan 16, 2004

I love snails


Hey

So we just moved to ESXi 5 and I am taking this oppertunity to redo all our data stores and LUNs, because they are horrible and messy.

Anyone know any good resources for Array/LUN sizing with VMWare or any suggestions?

We have a IBM DS3524 with 8SAS drives (2.8TB useable) and 8 SATA (4.5TB useable), with around 20 VMS on two hosts.



On another note I was looking at the new storage DRS stuff and I think I poo poo myself with excitment

Alctel fucked around with this message at 13:33 on Mar 14, 2012

  • 1
  • 2
  • 3
  • 4
  • 5
  • Post
  • Reply