Enterprise Storage Megathread: Why is my NAS a SAN?

The Something Awful Forums > Discussion > Serious Hardware/Software Crap > Enterprise Storage Megathread: Why is my NAS a SAN?

«‹›207 »

three: Aug 9, 2007; i fantasize about ndamukong suh licking my doodoo hole

Misogynist deleted all the files.

# ? Jan 18, 2013 21:16

Adbot: ADBOT LOVES YOU

# ? Apr 26, 2024 16:46

Vulture Culture: Jul 14, 2003; I was never enjoying it. I only eat it for the nutrients.

three posted:

Misogynist deleted all the files.

I wish I could just delete everything; it would make my job a lot easier. We've had a sync running to our IBM SONAS so we can have twice as much of this delicious data!

# ? Jan 18, 2013 21:23

Pile Of Garbage: May 28, 2007

Misogynist posted:

I wish I could just delete everything; it would make my job a lot easier. We've had a sync running to our IBM SONAS so we can have twice as much of this delicious data!

Unngghh you luck bastard! I've always wanted to work with an IBM SONAS setup. I went to a SONAS workshop hosted by IBM a while back and was quite impressed by the features and capabilities of GPFS. What do you think of it? Does it live up to all the hype?

# ? Jan 19, 2013 01:21

Powdered Toast Man: Jan 25, 2005; TOAST-A-RIFIC!!!

I've been working with a NetApp FAS3240 at my new job and I have to say that I love it. I was pleasantly surprised by how easy to use the management tools are.

Snapmanager for Exchange, on the other hand, can kiss the darkest part of my rear end in a top hat.

# ? Jan 19, 2013 06:05

YOLOsubmarine: Oct 19, 2004; When asked which Pokemon he evolved into, Kamara pauses.

"Motherfucking, what's that big dragon shit? That orange motherfucker. Charizard."

Powdered Toast Man posted:

I've been working with a NetApp FAS3240 at my new job and I have to say that I love it. I was pleasantly surprised by how easy to use the management tools are.

Snapmanager for Exchange, on the other hand, can kiss the darkest part of my rear end in a top hat.

What issues are you having with SME? It generally works pretty well once it's set up unless you've got a huge DAG with a lot of servers separated by WAN links. PM me and I can assist.

# ? Jan 21, 2013 00:28

adorai: Nov 2, 2002; 10/27/04 Never forget; Grimey Drawer

NippleFloss posted:

What issues are you having with SME? It generally works pretty well once it's set up unless you've got a huge DAG with a lot of servers separated by WAN links. PM me and I can assist.

I have to agree. Every snapmanager product simply works. There was a period of time when SMVI was kind of crappy, but those days are now over. The products just work.

# ? Jan 21, 2013 08:30

namaste friends: Sep 18, 2004; by Smythe

The first thing to check if your Snapmanger for <some windows product> is failing, is if one of your volumes has run out of space or you're about to run out of space.

# ? Jan 21, 2013 09:21

Binskin: Dec 15, 2005; Better this then 'Stupid Newbie'.

Anyone had much experience with V7000 Unified cifs? Im in the process of migrating my user shares across from our existing netapp filer and all was running swimmingly until i tried to kick off lotus notes clients (7.0.3) under my published citrix desktop (XenApp 5 on 03 R2)- redirected notes data directories to said home drives.

Now this process isn't an issue on the netapp, but for the love of god i can't get the lotus client the permissions it is apparently looking for to function under the unified, we just get splash screens for lotus then nothing. Supplying a known working notes.ini (that you can browse to / edit as the user) prompts for the profile you want to work with, then a 'browse to notes dir' window, once you select the notes data dir it simply bombs 'file system not ready' or words to that effect.

I believed originally it may have been the filer a/v having a spit due to trying to scan larger nsf files, so i tested on a share setup under a filesystem that doesn't have a/v scanning configured as yet, no dice.

Reconfigured the secondary test share i had setup with share permissions of everyone / all & ntfs permissions of domain users / full on the root, setting the test user account as full on the home drive directory, still no dice.

Completed an additional test by setting up the same share / ntfs permissions on an 08 R2 box, no problems.

The netapp (ibm badged n3600) had the capability to turn on auditing, but i can't seem to find the option on the Unified, instead i opted to try running sysinternals process monitor to catch any glaring DENIED DENIED DENIED errors, but nothing particularly stood out ... stumped.

Any thoughts?

# ? Jan 21, 2013 12:39

Mr. Fossey: Mar 31, 2003; Fresh bananas for the whole crew!

We are looking to replace our aging and slow MSA2012i. Basically we have 7TB of rarely accessed engineering data, 300GB of exchange 2010 over 150 users, and 10VMs with very little usage. Our expected growth in the next 24mo is an additional 7TB of data with the data being a marginal candidate for compression but not a candidate for dedup.

Right now we are looking at a Netapp 2240 w/ 6x100GB SSD and 18x1TB SATA and SnapRestore. The idea is to let the netapp serve the bulk of the files over CIFS with the VMs connecting over iSCSI. Is this a) a reasonable build b) something manageable by non SAN folks c) The right tool for the job. Are there other vendors we should be looking at?

On a secondary note how important is the installation services. They are coming in at 30% of the cost of the hardware itself.

# ? Jan 22, 2013 22:59

Docjowles: Apr 9, 2009

Probably worth at least looking at Equallogic's offerings. They've changed their lineup all around since the last time I evaluated them but something like the PS6100X might be a good candidate. That or the EMC VNXe depending on how important the "point and click, idiot proof" requirement is.

# ? Jan 23, 2013 00:57

YOLOsubmarine: Oct 19, 2004; When asked which Pokemon he evolved into, Kamara pauses.

"Motherfucking, what's that big dragon shit? That orange motherfucker. Charizard."

Mr. Fossey posted:

We are looking to replace our aging and slow MSA2012i. Basically we have 7TB of rarely accessed engineering data, 300GB of exchange 2010 over 150 users, and 10VMs with very little usage. Our expected growth in the next 24mo is an additional 7TB of data with the data being a marginal candidate for compression but not a candidate for dedup.

Right now we are looking at a Netapp 2240 w/ 6x100GB SSD and 18x1TB SATA and SnapRestore. The idea is to let the netapp serve the bulk of the files over CIFS with the VMs connecting over iSCSI. Is this a) a reasonable build b) something manageable by non SAN folks c) The right tool for the job. Are there other vendors we should be looking at?

On a secondary note how important is the installation services. They are coming in at 30% of the cost of the hardware itself.

The 2240s are plenty powerful to run everything you want but I'm not sure they are giving you enough disk. Make sure they show you the actual raid layout they are proposing to prove that you will get to your required usable storage amount. You're going to lose disks to spares, to raid-DP, and the SSD will be configured as caching for the SATA disks so you won't see the space from those. So if you lose 4 disks for raid DP (you need at least two raid groups, one for each controller) plus 1 spare you're down to 13 1TB disks that are really only around 850GB usable once they are rightsized. That won't get you to the ~14TB usable you're asking for, though it will get you pretty close.

Your configuration is fine and reasonable. Getting as much CIFS data as possible shared directly from the NetApp provides a lot of benefits. For VMWare I'd recommend NFS instead of iSCSI for ease of management and because certain abilities like constant-time single file cloning become much more powerful when you're using native WAFL files rather than LUNs that are VMFS formatted.

Capacity numbers aside, NetApp is pretty easy to use even for non-storage-centric folks. Any decent admin can find their way around the systems pretty well. There's also a lot of good information on the support site and communities to help you get up to speed. If you have any additional questions about NetApp specifically feel free to PM me.

# ? Jan 23, 2013 03:13

madsushi: Apr 19, 2009; Baller.
#essereFerrari

Mr. Fossey posted:

We are looking to replace our aging and slow MSA2012i. Basically we have 7TB of rarely accessed engineering data, 300GB of exchange 2010 over 150 users, and 10VMs with very little usage. Our expected growth in the next 24mo is an additional 7TB of data with the data being a marginal candidate for compression but not a candidate for dedup.

Right now we are looking at a Netapp 2240 w/ 6x100GB SSD and 18x1TB SATA and SnapRestore. The idea is to let the netapp serve the bulk of the files over CIFS with the VMs connecting over iSCSI. Is this a) a reasonable build b) something manageable by non SAN folks c) The right tool for the job. Are there other vendors we should be looking at?

On a secondary note how important is the installation services. They are coming in at 30% of the cost of the hardware itself.

1) SSDs are going to be a waste for you. They won't help the engineering data, Exchange 2010 was designed to run on SATA disks and users won't notice, and low-usage VMs aren't important enough to warrant the cost.

2) I would look at bigger disks based on your use cases. You can get the 2240 with 2TB SATA disks. If you're going to do an HA configuration, you need to burn 3 disks (at least) for that, plus 1 spare and 2-4 DP on the main aggregate, would still net you over 30TB and you'd have exactly the same performance as with the 1TB SATA.

3) Use NFS for VMware, it will be easier to manage.

Essentially, there's a GB/IOPS scale for each disk type (1TB disks = 13 GB/IOPS, 2TB disks = 26 GB/IOPS, etc) and the fact is you sound like a pretty low IOPS shop as you describe it. You can easily run Exchange + CIFS + 10 VMs on 24 disks.

# ? Jan 23, 2013 05:11

adorai: Nov 2, 2002; 10/27/04 Never forget; Grimey Drawer

madsushi posted:

You can easily run Exchange + CIFS + 10 VMs on 24 disks.

I agree, I would run 2 aggregates of 11 disks each with 2 spares. This will get you HA and you can split your logs/data if you choose.

# ? Jan 23, 2013 05:25

YOLOsubmarine: Oct 19, 2004; When asked which Pokemon he evolved into, Kamara pauses.

"Motherfucking, what's that big dragon shit? That orange motherfucker. Charizard."

madsushi posted:

1) SSDs are going to be a waste for you. They won't help the engineering data, Exchange 2010 was designed to run on SATA disks and users won't notice, and low-usage VMs aren't important enough to warrant the cost.

I'm sure the SSDs were included to be used as part of a flashpool aggregate. Whether that's necessary for what they're currently running is debatable but Exchange 2010 and VMWare will certainly see a benefit. Exchange 2010 can live happily with 20ms or less latency but SATA disk latencies are 8 to 10ms in the best of circumstances and ramp up pretty drastically as utilization increases so having cache to take some load off is never a bad thing and it provides some future proofing.

I do agree that they need to look at 2TB disks though, there isn't enough capacity in the original quote to meet their growth requirements.

# ? Jan 23, 2013 06:00

Powdered Toast Man: Jan 25, 2005; TOAST-A-RIFIC!!!

The biggest issue I've had with SME is that it seems if you make any sort of change to your mailbox server (change the name of a database, create a new database, move a database, etc), you have to run the configuration wizard again and you have to throw out your backup job and create a whole new one. If I'm mistaken about this, please enlighten me, but NetApp has a knowledgebase article that says pretty much exactly what I just said.

# ? Jan 28, 2013 17:33

madsushi: Apr 19, 2009; Baller.
#essereFerrari

Powdered Toast Man posted:

The biggest issue I've had with SME is that it seems if you make any sort of change to your mailbox server (change the name of a database, create a new database, move a database, etc), you have to run the configuration wizard again and you have to throw out your backup job and create a whole new one. If I'm mistaken about this, please enlighten me, but NetApp has a knowledgebase article that says pretty much exactly what I just said.

I think the idea is to set it up once and then forget it.

You can actually edit the Scheduled Task and just add in/edit the database name if you keep the syntax straight.

# ? Jan 28, 2013 17:45

Powdered Toast Man: Jan 25, 2005; TOAST-A-RIFIC!!!

madsushi posted:

I think the idea is to set it up once and then forget it.

You can actually edit the Scheduled Task and just add in/edit the database name if you keep the syntax straight.

In theory you are supposed to be able to set it up and forget it. In practice, if you make any of the changes I mentioned, it breaks the backup job, even if it is only one DB out of several that doesn't work. It still breaks the entire job. The error is something about the path not matching when it attempts to do a VSS copy, and when I gave that information to NetApp they sent me to the aforementioned KB article...which says that you have to run the config wizard again, and if it is an existing DB and you changed the name, you have to move it to another LUN and then move it back. Yeah.

# ? Jan 28, 2013 18:11

YOLOsubmarine: Oct 19, 2004; When asked which Pokemon he evolved into, Kamara pauses.

"Motherfucking, what's that big dragon shit? That orange motherfucker. Charizard."

Yes, if you make changes to the database layout the Config Wizard needs to be run. It's creating a catalog of objects to be backed up, and where to back them up (snapinfo locations), so when those things change it needs to be updated. That said, re-running the config wizard is generally a minute or two long process if everything is already properly located on NetApp storage. It just creates a few snapinfo directories or moves some existing ones around.

Regarding backup jobs, you can create those manually in such a way that they don't need to be re-created when DB names change or locations change. You can set up a scheduled backup that backs up everything on the server as a default, so newly added DBs are picked up automatically ONCE you run the config wizard to get them added to the configuration.

All SME does to run a backup is call a wrapper to powershell with the SME cmdlets loaded, and then run the new-backup cmdlet. If you're using Exchange 2010 then you can remove the -Database flag and all of the explicitly listed database from the scheduled job. If you don't specify specific Databases then it will back up all of them on that server. You can also specify DBs by activation preference, or back up only the active or passive copies, without having to explicitly call out the DBs. The new-backup cmdlet is a lot more flexible than the GUI implies and I'd suggest playing around with it a bit and tuning your backup jobs manually to get them do to what you want.

# ? Jan 28, 2013 19:28

madsushi: Apr 19, 2009; Baller.
#essereFerrari

Powdered Toast Man posted:

In theory you are supposed to be able to set it up and forget it. In practice, if you make any of the changes I mentioned, it breaks the backup job, even if it is only one DB out of several that doesn't work. It still breaks the entire job. The error is something about the path not matching when it attempts to do a VSS copy, and when I gave that information to NetApp they sent me to the aforementioned KB article...which says that you have to run the config wizard again, and if it is an existing DB and you changed the name, you have to move it to another LUN and then move it back. Yeah.

No, I meant you're supposed to set up Exchange once, and then forget it. I can count the number of times I've renamed an Exchange database on one hand. And, with Exchange 2010, and especially on a NetApp, you want to aim for fewer/bigger databases, so creating new ones should happen pretty rarely.

e: so I don't double-post:

I have been doing some NetApp vs Nimble comparisons lately, and it seems like there is one feature on Nimble that I don't quite understand. Nimble claims that their method of coalescing random writes into sequential stripes is somehow much faster than NetApp, and in fact Nimble claims that their write methods are up to 100x faster than others. I don't really see how this is possible. Can anyone with Nimble experience/knowledge add any insight?

madsushi fucked around with this message at 21:09 on Jan 28, 2013

# ? Jan 28, 2013 21:07

Powdered Toast Man: Jan 25, 2005; TOAST-A-RIFIC!!!

I'm using Exchange 2007, and we don't have the budget to upgrade (probably for the entire year), so all that lovely stuff that 2010 adds doesn't apply to me.

I'm also running my mailbox server on Server 2003R2, not sure if that makes a difference.

# ? Jan 28, 2013 21:20

YOLOsubmarine: Oct 19, 2004; When asked which Pokemon he evolved into, Kamara pauses.

"Motherfucking, what's that big dragon shit? That orange motherfucker. Charizard."

madsushi posted:

I have been doing some NetApp vs Nimble comparisons lately, and it seems like there is one feature on Nimble that I don't quite understand. Nimble claims that their method of coalescing random writes into sequential stripes is somehow much faster than NetApp, and in fact Nimble claims that their write methods are up to 100x faster than others. I don't really see how this is possible. Can anyone with Nimble experience/knowledge add any insight?

This claim very likely stems from the fact that they claim that they never have to "fill holes" in a raid stripe. On NetApp you may write full stripes initially, but over time you end up with holes punched in those stripes due to some blocks within the stripe being freed due to overwrites or dedupe. Over time you end up with an aggregate full of incomplete raid stripes and very little contiguous disk space left to write full stripes into. At that point you have to start doing partial stripe writes and filling those holes in which makes flushing to disk much slower.

Nimble has a constantly running background scanner that looks for fragmented stripes and re-lays them out as complete stripes in the background. This (in theory) prevents them from ever having to do partial writes since they are constantly freeing up new space for writes. This is dependent on having the space available to re-arrange the data, and on the scanner freeing up full stripes for writing faster than new writes are coming in. I don't know enough about Nimble to know how, or if, they can guarantee those things. Starting with 8.1 NetApp included Continuous Segment Cleaning which is basically a background scanner that serves the same purpose as the one I described. Whose works better is anyone's guess, because they aren't going to share those details, but they are very similar in purpose.

Of course, talking about how fast they can write to the disk on the back-end is a bit of a red herring. Writes on both Nimble and NetApp are acknowledged once they hit battery-backed NVRAM, long before they are ever sent to spinning disk. So it really only needs to write fast-enough to get to disk before the next time you need to flush your NVRAM. You might seem problems on very full, very aged NetApp filesystems, but for most users if they are seeing write delays it's going to be due to running out of controller headroom, running out of disk throughput, or doing a ton of misaligned I/O from VMWare.

# ? Jan 28, 2013 22:14

Nomex: Jul 17, 2002; Flame retarded.

madsushi posted:

I have been doing some NetApp vs Nimble comparisons lately, and it seems like there is one feature on Nimble that I don't quite understand. Nimble claims that their method of coalescing random writes into sequential stripes is somehow much faster than NetApp, and in fact Nimble claims that their write methods are up to 100x faster than others. I don't really see how this is possible. Can anyone with Nimble experience/knowledge add any insight?

With spinning disk, if you're making a long sequential write you get high bandwidth, but the second the head has to start jumping around the platter your performance tanks. A 3.5" 7200RPM SATA drive will do somewhere between about 80 and 130 MB/s on a sequential write, but will do somewhere around 500KB/s-1.5MB/s on 4k random writes. This is due to rotational latency introduced when the head has to wait for data to come around again. The Nimble array would still have to do uncached reads from the disk, which would bring the performance down somewhat, but aligning all the writes into sequential order would definitely boost write performance.

Netapp Flashpool caches overwrites though, so if you have that, it really helps with write performance anyways.

# ? Jan 29, 2013 06:47

evil_bunnY: Apr 2, 2003

ONTAP already aligns writes.

# ? Jan 29, 2013 12:00

Demonachizer: Aug 7, 2004

Could someone give me some quick advice on how to proceed with converting a file server to a VM? Currently we are consolidating about 4-5 different file servers into a single VM. When deploying it, I am thinking of keeping the disk space of the VM small and presenting the storage to the server rather than provisioning a huge VMDK.

Does this make sense as the best way to do this? Also we have been running backups to tape from a mirrored file server using Data Protector Express. Is it sensible to continue doing it in this fashion?

# ? Jan 29, 2013 16:40

evil_bunnY: Apr 2, 2003

It'll depend on your environment. Be mindful of vmfs config maximums.

# ? Jan 29, 2013 16:48

Erwin: Feb 17, 2006

demonachizer posted:

Could someone give me some quick advice on how to proceed with converting a file server to a VM? Currently we are consolidating about 4-5 different file servers into a single VM. When deploying it, I am thinking of keeping the disk space of the VM small and presenting the storage to the server rather than provisioning a huge VMDK.

Does this make sense as the best way to do this? Also we have been running backups to tape from a mirrored file server using Data Protector Express. Is it sensible to continue doing it in this fashion?

There's no good reason not to move the file store to a VMDK, other than the work it entails (unless it's hundreds of terabytes or something). Once it's moved, though, you'll be in a more flexible position to move it, tier it, snapshot it, etc. You should keep the OS on a separate VMDK regardless.

If that backup scheme works for you, you might as well keep it going.

# ? Jan 29, 2013 17:06

Demonachizer: Aug 7, 2004

evil_bunnY posted:

It'll depend on your environment. Be mindful of vmfs config maximums.

Do you mean the maximum RDM size? We are fine there as it is vSphere 5 and I think the filestore will end up being around 4-5TB.

Erwin posted:

There's no good reason not to move the file store to a VMDK, other than the work it entails (unless it's hundreds of terabytes or something). Once it's moved, though, you'll be in a more flexible position to move it, tier it, snapshot it, etc. You should keep the OS on a separate VMDK regardless.

If that backup scheme works for you, you might as well keep it going.

There are no issues with having a VMDK that is around 5TB? I was under the impression that an RDM worked out better. It would be great to just P2V the fucker and walk away.

# ? Jan 29, 2013 18:22

Erwin: Feb 17, 2006

demonachizer posted:

There are no issues with having a VMDK that is around 5TB? I was under the impression that an RDM worked out better. It would be great to just P2V the fucker and walk away.

No, ignore me, I'm an idiot. The limit per VMDK is 2TB.

# ? Jan 29, 2013 18:30

madsushi: Apr 19, 2009; Baller.
#essereFerrari

NippleFloss posted:

Of course, talking about how fast they can write to the disk on the back-end is a bit of a red herring. Writes on both Nimble and NetApp are acknowledged once they hit battery-backed NVRAM, long before they are ever sent to spinning disk. So it really only needs to write fast-enough to get to disk before the next time you need to flush your NVRAM. You might seem problems on very full, very aged NetApp filesystems, but for most users if they are seeing write delays it's going to be due to running out of controller headroom, running out of disk throughput, or doing a ton of misaligned I/O from VMWare.

Yeah, I'm familiar with the NetApp side, which is why I couldn't figure out how Nimble was fundamentally different, outside of the background sweeper (which we'll have to upgrade from 8.0.1 to 8.1 for).

We have a FAS2040 with 27 FC-attached SATA drives (DS14mk2 x2) in one aggregate. The aggregate is only about 40-50% full, and I ran volume-level reallocations on each volume when we added the 2nd shelf. No hot disks, idle disks, etc.

I can throw 25k IOPS at it, if we're talking about 4k writes, because it's all getting soaked up by NVRAM and then striped. What we're seeing is that when we run something, like say a month's worth of Windows Updates, on 10 servers on the SATA aggregate, we're getting slammed with back-to-back CPs and high-water mark CPs and the write latency on both our SAS and SATA aggregates skyrockets into triple digits. If I turn on IOMeter and set it to 8MB writes to SATA, I see the same effect. Running that test on SAS barely touches our latency.

So, our team has been having to run Windows Updates on 2-3 servers at a time, and it's taking them forever to get through a maintenance period. I think there has been some overlap between the maintenance window, dedupe jobs, and our SnapVault replication from the SATA aggregate, and so I think that is what is causing our problems. I am going to do some more troubleshooting during our next window, because our logging tool always stops recording SNMP data during the slowdowns so we've been without much good data. I am hoping that we can fix it just by adjusting scheduling.

My boss, however, is convinced that NetApp is handling flash/SATA poorly and that their solutions aren't eloquent, and looking into getting a pair of Nimbles and adding them to the mix, removing all of the NetApp SATA shelves and using the Nimble for bulk SATA storage instead. My issue is that I really don't see how a CS240 differs from a 2240 with a FlashPool (which would be another purchase option). It seems like both companies made all the same technology choices.

They both soak random writes to NVRAM.
They both serve random reads from SSD.
They both serve seq reads from disk.
They both write seq writes to disk, as fast as 8x 7.2k RPM drives will let you.

In fact, the Nimble only has 8 data drives, while a NetApp would have at least 12 in the primary aggregate. With the free space sweeper in 8.1, what does Nimble bring to the table? It seems like a really slick NetApp clone, with pros (no licensing poo poo to deal with) and cons (iSCSI only, SATA only); but no "killer app" that would make me want to switch. And, I'm still not sure I trust compression, and it's not like NetApp doesn't have compression too. Nimble has even said that their box, with it's 8 drives, could replace our whole 2040, which has 27 SATA and 32 SAS drives. I just don't see how it could ever match the back-end throughput when the rubber meets the road and we need to write a lot of data (storage vMotion, big file transfers, etc).

madsushi fucked around with this message at 20:22 on Jan 30, 2013

# ? Jan 30, 2013 20:19

Goon Matchmaker: Oct 23, 2003; I play too much EVE-Online

madsushi posted:

Nimble has even said that their box, with it's 8 drives, could replace our whole 2040, which has 27 SATA and 32 SAS drives.

I haven't messed around with Nimble or NetApp yet, but I have learned when vendors make fantastical claims like that, they're selling you a load of poo poo knowing you'll have to come crawling back to buy more of their product. If it sounds too good to be true, it probably is.

# ? Jan 30, 2013 20:29

YOLOsubmarine: Oct 19, 2004; When asked which Pokemon he evolved into, Kamara pauses.

"Motherfucking, what's that big dragon shit? That orange motherfucker. Charizard."

madsushi posted:

Lots of stuff

Have you opened a case with NetApp to have them look at performance during the updates? If you take a perfstat during one of them I would be happy to look at it and see if anything jumps out, though usually a B2B CP is just a case of overloading either the disk or the loop (are these external or internal, SAS or FC?).

One problem with the way NetApp does writes is that each controller has only a CP process for all writes, whether they are going to SATA, SAS, or SSD. So if you saturate your SATA disks writes to all other disks will suffer as well because they are caught in the CP bottleneck that the SATA is causing. Nimble doesn't have this issue because nimble only does SATA.

They further benefit from default enabled compression lowering the amount that they actually have to write to disk. Yes, you can enable compression on NetApp as well, but you will pay a CPU penalty. Technically you would pay one on Nimble too, however it's enabled by default and their boxes are sized with the expectation that compression is on. They probably also have a more efficient compression algorithm than NetApp since they use variable sized blocks (this may also help them handle larger writes better than NetApp in some cases). And they have much more CPU and RAM than your 2040 box, which will provide additional headroom. They probably also have some more efficient multi-threading going, whereas the 2040 is a single-threaded system.

So yea, Nimble has some definite advantages due to being new to the game. They've been able to learn from the growing pains of WAFL and likely avoided some of those problems as they wrote their code. But that doesn't make it a better storage product and you're correct that in the end the question is "can I write this much data out to this many spindles in this amount of time?". I'd be inclined to say no, not with 8 drives. We have plenty of NetApp's running on SATA here and they do just fine up until the point where disk utilization gets above 70% or so. It's not an issue of how well the controller or software handles SATA, it's just the fact that SATA has a steep latency curve as utilization goes up and utilization will go up if you add more load.

# ? Jan 30, 2013 21:32

madsushi: Apr 19, 2009; Baller.
#essereFerrari

NippleFloss posted:

Have you opened a case with NetApp to have them look at performance during the updates? If you take a perfstat during one of them I would be happy to look at it and see if anything jumps out, though usually a B2B CP is just a case of overloading either the disk or the loop (are these external or internal, SAS or FC?).

I haven't opened a case yet, because someone else had been working the issue up until this point because I had been too busy. I plan to have perfstat and sysstat data from the next update window, along with our own SNMP monitoring from the NetApp and the VMware hosts. The disks are in DS14mk2s, so they're external FC-connected SATA disks.

NippleFloss posted:

One problem with the way NetApp does writes is that each controller has only a CP process for all writes, whether they are going to SATA, SAS, or SSD. So if you saturate your SATA disks writes to all other disks will suffer as well because they are caught in the CP bottleneck that the SATA is causing. Nimble doesn't have this issue because nimble only does SATA.

Exactly, which is why our SAS write latency goes up during high SATA usage. I like the Active/Passive design style with the smaller NetApps, since it ensures you'll never outgrow your N+1 and makes it easier to manage. However, in this case, looking back, we probably should've put the SATA loop on the second controller (which is currently just holding on to 3 SAS disks and sitting idle) if only to isolate the CP congestion to our SATA volumes (which are non-customer facing/impacting).

NippleFloss posted:

So yea, Nimble has some definite advantages due to being new to the game. They've been able to learn from the growing pains of WAFL and likely avoided some of those problems as they wrote their code. But that doesn't make it a better storage product and you're correct that in the end the question is "can I write this much data out to this many spindles in this amount of time?". I'd be inclined to say no, not with 8 drives. We have plenty of NetApp's running on SATA here and they do just fine up until the point where disk utilization gets above 70% or so. It's not an issue of how well the controller or software handles SATA, it's just the fact that SATA has a steep latency curve as utilization goes up and utilization will go up if you add more load.

This was my feeling as well, since it's not like our workload ever has issues on SATA outside of very high usage. The problem has just been high profile since it's been affecting all of our VMs. My guess is that it's just storage congestion due to dedupe / SnapVault / normal traffic / Windows Updates all trying to run at the same time, and the SATA can't handle it. Unfortunately we can't move the SATA aggregate to the other head without having a temporary shelf on the other head to move things to. I had even thought about just unplugging the SATA FC loop and plugging it in to the other head to see if it showed up, but that gets pretty risky.

# ? Jan 30, 2013 22:39

YOLOsubmarine: Oct 19, 2004; When asked which Pokemon he evolved into, Kamara pauses.

"Motherfucking, what's that big dragon shit? That orange motherfucker. Charizard."

madsushi posted:

Exactly, which is why our SAS write latency goes up during high SATA usage. I like the Active/Passive design style with the smaller NetApps, since it ensures you'll never outgrow your N+1 and makes it easier to manage. However, in this case, looking back, we probably should've put the SATA loop on the second controller (which is currently just holding on to 3 SAS disks and sitting idle) if only to isolate the CP congestion to our SATA volumes (which are non-customer facing/impacting).

I've taken to recommending that my customer splits SAS onto one controller and SATA onto the other in scenarios like this for that reason. It's something I didn't become aware of until we had issues with a single overloaded SATA aggregate causing write latencies across the whole controller.

madsushi posted:

This was my feeling as well, since it's not like our workload ever has issues on SATA outside of very high usage. The problem has just been high profile since it's been affecting all of our VMs. My guess is that it's just storage congestion due to dedupe / SnapVault / normal traffic / Windows Updates all trying to run at the same time, and the SATA can't handle it. Unfortunately we can't move the SATA aggregate to the other head without having a temporary shelf on the other head to move things to. I had even thought about just unplugging the SATA FC loop and plugging it in to the other head to see if it showed up, but that gets pretty risky.

A statit should help you see if disk utilization is peaking around that time and how much latency disk is adding. You can check the various lengths of the CP phases in wafl_susp -w output to get a pretty good idea of what's causing CPs to run long and, hence, go B2B.

Why isn't the SATA look connected to the other head anyway, if they're in a cluster? They should both be able to see all disks. Swinging aggregates between controllers is pretty simple really, you just need to un-assign and re-assign all disks from the aggregate and it will show up as foreign, then you import it. I've moved entire shelves to move aggregates before without issue, and on a cluster where both controllers should see all disks it's as simple as an un-assign/re-assign (this is actually how storage failover works in clustered ontap, and it's pretty neat and much faster than CFO on 7-mode).

# ? Jan 30, 2013 23:01

madsushi: Apr 19, 2009; Baller.
#essereFerrari

NippleFloss posted:

Why isn't the SATA look connected to the other head anyway, if they're in a cluster? They should both be able to see all disks. Swinging aggregates between controllers is pretty simple really, you just need to un-assign and re-assign all disks from the aggregate and it will show up as foreign, then you import it. I've moved entire shelves to move aggregates before without issue, and on a cluster where both controllers should see all disks it's as simple as an un-assign/re-assign (this is actually how storage failover works in clustered ontap, and it's pretty neat and much faster than CFO on 7-mode).

The SATA disks are technically connected to both heads, but we were just thinking we'd need to connect another SATA shelf to migrate to first. I didn't realize you could move aggregates between controllers just by using unassign/reassign. I would just offline the volumes/aggregate, unassign from one head, and assign to the other? And the aggregate just shows up? I will see if I can find a TR or something explaining the process. That would save me a bunch of work/effort.

# ? Jan 30, 2013 23:36

YOLOsubmarine: Oct 19, 2004; When asked which Pokemon he evolved into, Kamara pauses.

"Motherfucking, what's that big dragon shit? That orange motherfucker. Charizard."

madsushi posted:

The SATA disks are technically connected to both heads, but we were just thinking we'd need to connect another SATA shelf to migrate to first. I didn't realize you could move aggregates between controllers just by using unassign/reassign. I would just offline the volumes/aggregate, unassign from one head, and assign to the other? And the aggregate just shows up? I will see if I can find a TR or something explaining the process. That would save me a bunch of work/effort.

Yes, you can do exactly that. Before offlining/un-assigning you'll want to make sure your vol0 (obviously) and mailbox aren't on the disks that you're moving. But otherwise it's a pretty basic un-assign/re-assign. There's a KB here: https://kb.netapp.com/support/index?page=content&id=1011651.

I think the old aggregate name may still show up on the original filer until a reboot, but obviously it won't have any disks or actually do anything.

# ? Jan 31, 2013 00:40

Confused_Donkey: Mar 16, 2003; ...

Reposting this from the HP support forum as I'm running out of ideas here.

We recently reactivated one of our older EVA SAN's (5000) and am running into issues with cache batteries.

The EVA sat for awhile, so of course the batteries died over the years. All 4 are marked as bad currently on both controllers.

I put a call into HP to 2 day me some new batteries, however after 2 weeks of (we don't know where they are) or (I'm sorry I cannot update you on your order status) I said to hell with it and cracked them open.

Thankfully I noticed that the Hawker Energy 2V 5AH batteries are still made(Albeit under the EnerSys brand now)

So we rebuilt two of the packs, batteries are fully charged, nice and happy.

Slap one in, instantly marked as FAILED (voltage reads perfectly however), the second one came up as working, however the charger never kicked in and let it die overnight.

I just rebuilt another cell, and once again marked as FAILED.

Looking closely at the PCB I noticed a freaking EEPROM which I'm guessing tracks battery charging history (Thanks!)

Does anyone have any ideas? This system is out of support, we are not buying another one, HP won't sell me the batteries without some long and drawn out excuse as to why they cannot find them at the warehouse, and now it seems I cannot even replace the cells myself because of the way the boards are designed (Assuming the EEPROM onboard is the issue)

I don't suppose HP hid an option somewhere on the controller to reset battery status and let me use my EVA that I paid for?

# ? Jan 31, 2013 01:29

Mierdaan: Sep 14, 2004; Pillbug

Get service express or similar aftermarket warranty company involved? They may rake you over the coals a bit on price, but they may have the batteries in stock.

# ? Jan 31, 2013 03:59

gbeck: Jul 15, 2005; I can RIS that

Anyone here using a Unitrends backup appliance? Any opinions good, bad, or otherwise?

The long story:
Currently I am using BackupExec 2010R3 to backup to a Data Domain 610. I am using LTO2 tapes for long term off-site storage. Do to a large number of servers that will be added this year the plan was to purchase a new Data Domain this year. The two DD610s would be at the primary site and then replicate to the new appliance off-site.

I talked to Unitrends last year when researching but compared to just upgrading one Data Domain the price was a bit higher than I was hoping to spend.

Fast forward to now, Unitrends provided us with a simplified but aggressive quote. Basically they are offering a Recovery-833 and their disk archiving for basically the price of the new Data Domain system. I would lose my online disk based off-site backups for the time being. It would also mean using spinning disk for long term off-site storage. I am not sure how I feel about that. I work in healthcare so we like to keep lots of backups for a long time.

On the bright side I can get away from per server agent licenses and dealing with keeping track of those licenses.

# ? Jan 31, 2013 04:44

in a well actually: Jan 26, 2011; dude, you gotta end it on the rhyme

Confused_Donkey posted:

Reposting this from the HP support forum as I'm running out of ideas here.

We recently reactivated one of our older EVA SAN's (5000) and am running into issues with cache batteries.

The EVA sat for awhile, so of course the batteries died over the years. All 4 are marked as bad currently on both controllers.

I put a call into HP to 2 day me some new batteries, however after 2 weeks of (we don't know where they are) or (I'm sorry I cannot update you on your order status) I said to hell with it and cracked them open.

Thankfully I noticed that the Hawker Energy 2V 5AH batteries are still made(Albeit under the EnerSys brand now)

So we rebuilt two of the packs, batteries are fully charged, nice and happy.

Slap one in, instantly marked as FAILED (voltage reads perfectly however), the second one came up as working, however the charger never kicked in and let it die overnight.

I just rebuilt another cell, and once again marked as FAILED.

Looking closely at the PCB I noticed a freaking EEPROM which I'm guessing tracks battery charging history (Thanks!)

Does anyone have any ideas? This system is out of support, we are not buying another one, HP won't sell me the batteries without some long and drawn out excuse as to why they cannot find them at the warehouse, and now it seems I cannot even replace the cells myself because of the way the boards are designed (Assuming the EEPROM onboard is the issue)

I don't suppose HP hid an option somewhere on the controller to reset battery status and let me use my EVA that I paid for?

http://www.ebay.com/sch/i.html?_odkw=eva+5000&_osacat=0&_from=R40&_trksid=p2045573.m570.l1313&_nkw=eva+5000+battery&_sacat=0

Or one of the secondhand enterprise vendors like:
http://www.xsnet.com/

# ? Jan 31, 2013 05:43

Adbot: ADBOT LOVES YOU

# ? Apr 26, 2024 16:46

Confused_Donkey: Mar 16, 2003; ...

Ended up calling around for HSV110 / EVA 5000 Cache Batteries, both places promised callbacks, none were received. Called back and was sent to voicemail.

Sooo took the cells apart again, changed out the boards with 2 others, charged the batteries on a trickle charger, tossed them in, and they work great.

Bought a few off of eBay as replacements so I can have a few factory batteries however.

# ? Feb 1, 2013 03:51

The Something Awful Forums > Discussion > Serious Hardware/Software Crap > Enterprise Storage Megathread: Why is my NAS a SAN?

«‹›207 »