Enterprise Storage Megathread: Why is my NAS a SAN?

The Something Awful Forums > Discussion > Serious Hardware/Software Crap > Enterprise Storage Megathread: Why is my NAS a SAN?

«‹›4 »

madsushi: Apr 19, 2009; Baller.
#essereFerrari

I'm just sad that Vaughn Stewart was the gorilla.

# ¿ May 26, 2012 18:13

Adbot: ADBOT LOVES YOU

# ¿ Apr 19, 2024 01:48

madsushi: Apr 19, 2009; Baller.
#essereFerrari

I have a Nimble CS210 demo box, if anyone has any questions about it. From my first impressions, it's very similar to a NetApp.

Some notes:

Only sold in HA configuration (2 controllers)
Only active/passive (the 2nd controller is NOT SERVING DATA)
The OS/config is stored on an onboard USB stick (so no disks are consumed by the 2nd controller)
iSCSI-only

Some pros:
There are no "LUNs", just volumes, which makes provisioning space easier
Volume Groups are AWESOME, made setting stuff up way easier (manage the group instead of individual volumes)
Backups being baked-in to the controller are great
Easy to set up

Some cons:
iSCSI-only with awful VAAI support means it's not well-suited for VMWare
You need to replace your SSDs every few years
Little-to-no tools for recovering SQL/Exchange data, you're just mounting the snapshots and doing all the work yourself.
Needs like 12 goddamned IP addresses (shared mgmt, shared iscsi, controller A, controller B, and then EACH GODDAMN NIC needs its own IP)

I should have some more notes/info as I get more time to dive in.

# ¿ Jun 13, 2012 23:27

madsushi: Apr 19, 2009; Baller.
#essereFerrari

complex posted:

Not sure why "iSCSI-only" makes it bad for VMware. As for "awful" VAAI support, which primitives are you missing that you wish you had? (Note: This is a test) I found VAAI support to be great.

Every SSD will need to be replaced every few years, perhaps the ones in the Nimble sooner than some other array where they are not a pass-through cache. But the failure of an SSD in the Nimble is a non-event. The cache is simply temporarily smaller and there is no interruption in service. Want to test it? Pull an SSD live.

I will definitely test pulling an SSD today. :parrot:

iSCSI-only isn't a problem with VMWare, if VAAI is present and supported. From my reading, Nimble's only VAAI primitive is Write-Same, which isn't what I was looking for.

What I do want to see, at a minimum, is Atomic Test & Set (ATS) which is critical for ensuring good iSCSI/LUN performance when you have multiple hosts/multiple VMs using the same LUN. I am specifically trying to avoid the LUN locking overload that plagues iSCSI deployments.

# ¿ Jun 14, 2012 16:52

madsushi: Apr 19, 2009; Baller.
#essereFerrari

Internet Explorer posted:

We just bought 2 VNX 5300 and I've never even heard of MPFS. So... Ah.. Yeah.

MPFS is an EMC thing, you have an agent that talks with your EMC SAN and requests a file, the EMC SAN gives you the blocks where that file is at, and your agent goes and gets those blocks via iSCSI or FC. The idea is that you're serving up files (like with NFS/CIFS) but that they're retrieved by the block via a block protocol which has less overhead than NFS/CIFS.

# ¿ Jun 18, 2012 07:09

madsushi: Apr 19, 2009; Baller.
#essereFerrari

If the source is still intact, you can use a tool like SetACL to mirror the permissions from the source to your destination.

# ¿ Jun 19, 2012 22:52

madsushi: Apr 19, 2009; Baller.
#essereFerrari

NippleFloss posted:

This is part of your problem. Your company needs to spend some of that $700m a year on storage from an actual top tier storage vendor. When you go with a well known and widely used vendor you'll get much better support and have a broad community of users to go to with questions. Given the size and number of employees you're well above the SMB market that RelData was targeting. The saying "no one has ever gotten fired for buying IBM" is as true as ever (though you can easily replace IBM with any number of tier one storage providers).

It's no coincidence that IBM sells NetApp. :smug:

# ¿ Jun 20, 2012 21:35

madsushi: Apr 19, 2009; Baller.
#essereFerrari

Nomex posted:

We're migrating exchange from 2k3 to 2k10. They decided to move 2k3 to the new Netapp array before doing the upgrade. 336 15k disks in the old exchange storage, 72 in the new. What could possibly go wrong?

How big of an Exchange environment? 72x 15k disks could handle about 3,000 IOPS.

# ¿ Jun 25, 2012 21:26

madsushi: Apr 19, 2009; Baller.
#essereFerrari

Internet Explorer posted:

That sounds very low. Should be closer to 10,000 IOPS for 72 15k disks.

My bad, missed a 1, was supposed to be 13,000 IOPS. I always figure about 175 IOPS max for a 15k RPM drive, and 72x175 = 12,600.

For drive IOPS, I am usually referencing what I learned from a NetApp performance guy:

7.2K - 50-100 IOPS
10K - 100-150 IOPS
15K - 150-200 IOPS

Where the first number is "good to go" and the second number is "peak, latency will go up"

# ¿ Jun 26, 2012 00:16

madsushi: Apr 19, 2009; Baller.
#essereFerrari

Nomex posted:

I go by 180 for a 15k disk. We were able to get about 10,300 out of them before they maxed out. Suffice it to say 72 drives was woefully inadequate. We had to salvage 6 x DS14 15k disk shelves from one of our old filers to get things running smoother. It's still sub-optimal, but they've started moving mailboxes to 2010 now, so things are getting better every day.

Factoring in spares (1-3) and parity disks (8-10), 10,300 IOPS is actually pretty reasonable. However, you must have a massive Exchange environment if 10k IOPS wasn't adequate.

# ¿ Jun 26, 2012 07:43

madsushi: Apr 19, 2009; Baller.
#essereFerrari

nuckingfuts posted:

Does anyone know what happens to a failed NetApp drive after it is returned? Is it sanitized / destroyed? We just had a drive fail and my boss asked, I haven't found an answer yet and thought someone here might now.

I know that they are sanitized, otherwise they can retain some of the ownership info they had previously (which happens all the time when I buy 3rd-party NetApp drives). I heard somewhere (not officially) that the good ones are repaired/reused as spares/replacements and the bad ones are canned.

# ¿ Jul 10, 2012 03:46

madsushi: Apr 19, 2009; Baller.
#essereFerrari

NippleFloss posted:

Protection manager could be used to do this somewhat trivially.

Protection Manager could be great... if it wasn't such a piece of poo poo.

Let me count the ways:

1) What the gently caress is up with requiring 130% space on your destination volumes? Sometimes I want for my 100GB volume to SnapVault to another 100GB volume, and I really don't enjoy the idea of requiring the destination volume to be 130GB. I end up making all of my SnapVault relationships manually and then importing them to get around this... but that's the OPPOSITE of what I want to be doing.

2) Speaking of, what's up with all of the arbitrary volume requirements? The language is different between my source and destination volumes, which doesn't matter at all for LUNs, but I guess that's a good enough reason to not let me set up a SnapVault relationship!

3) There needs to be a really simple SnapVault option in Protection Manager where PM goes and gets the last snapshot taken on the source and then copies it over to the destination. Requiring me to reconfigure every single SnapDrive and SnapManager instance is a huge task, whereas PM could EASILY be smart enough to grab the latest snapshot name to sync over.

I spoke with one of the OnCommand/PM project managers at Insight, and he was explaining how you could take 10 NetApps and put them into a big destination pool and let PM manage everything -- it would make all of the volumes 16TB and thin-provision everything. That sounds great... if you had 10 NetApps. If you're just trying to sync 1-2 NetApps to 1-2 other NetApps, PM simply doesn't give you the options or the flexibility (30% OVERHEAD REQUIRED) that I want. I am working on replacing the whole goddamn thing with a series of PowerShell scripts and calling it a day.

# ¿ Jul 21, 2012 04:10

madsushi: Apr 19, 2009; Baller.
#essereFerrari

NippleFloss posted:

It'd definitely not a perfect product and the earlier iterations were basically unusable, but it has improved to the point where it is functional and possibly even useful if you spent some time getting familiar with it. Regarding your specific issues:

1) There isn't a one-to-one ratio of source-to-destination size for snapvaults, so it doesn't really make sense to size them at one-to-one. A vault destination will have a different number of snapshot copies than the source (generally more) and if dedupe is in use the data is initially re-inflated before being deduped on the destination. The extra size is accounting for that overhead. That said, it's not a strict %130, the calculation is a bit more detailed than that, and differs depending on whether you're using 3.7 or 3.8 and up. There are some hidden options that can be changed to tune the calculation to provide more or less additional space. If you're interested I can provide them. Enabling Dynamic Secondary Sizing is probably the best way to go, provided you're on 3.8 or above.

2) Snapvault gets very unhappy when there are volume language mismatches between a source and destination. This isn't a Protection Manager issue, it's a WAFL issue, or, more generally, an issue with there not being a direct mapping of some characters from one language to another. If the destination volume doesn't support umlauts because of it's language setting and there are files on the source that have umlauts then it's going to fail.

3) The integration with SnapDrive and SnapManager is required because the vaults get cataloged in PM as being part of a SnapManager backup set. That allows you do do things like perform a restore from an archive transparently, or perform your validation on the secondary site. You can't do that if you don't have that catalog information because you are performing your vaulting separately from your SM backups. Of course, for some people that would be just fine, and so the limitation sucks, but that's the rationale behind it.

1) I have never been able to make a volume smaller than 1.3x and still have Protection Manager accept it as a candidate for SnapVault. I opened a TAC case to see about reducing that down but never got anywhere. Sometimes my vault will be almost a mirror, sometimes I want the vault to store fewer snapshots than the source, etc. My destination filer has about 1.2x the space of my production filers, so making every volume start at 1.3x really doesn't work well. If you know of a way to get the minimum size under 1.3x, I am all ears and that would help quite a bit.

2) I can get the volume language mismatch issue, but I guess I am unhappy there is not 1) an override 2) a button to "fix" it in PM 3) when I fix it myself, I have to wait 15-30m before Protection Manager sees that I fixed the volume name manually.

3) Gotcha, restore from archive is actually a good point I did not think about.

I still use PM at several client sites simply because it's better than my batch files, but it feels like it's a lot of work/learning for small clients (1-2 NetApps) and there are so many little "gotchas" that make it difficult for me to teach others.

Here's a day in the life:

1) (SD install) Enable SnapDrive integration with Protection Manager.
2) (SM config wizard) Enable SnapManager integration with Protection Manager.
3) (NetApp Management Console) Attach the newly-created dataset with some destination volumes. It wants to make them 1.3x? OK, we'll make them manually.
4) (OnCommand System Manager) Make the volume, turn off snapshots, turn on manual dedupe, make qtree.
5) Wait 15 minutes for Protection Manager to rescan
6) (NetApp Management Console) Try to attach the new qtrees to the dataset, but the volume language is wrong.
7) (ONTAP CLI) Change the volume language
8) Wait 15 minutes for Protection Manager to rescan
9) (NetApp Management Console) Attach the new qtrees (finally), assign a policy, initialize the SnapVault
10) (SM backup wizard) Configure the backup jobs to archive

All of that is due to Protection Manager, not including the steps needed to set up SD/SM and MPIO and the application database migration in the first place. Right now I can make a volume in 10 minutes and hand it off to a non-storage admin who can use SnapDrive and SnapManager to get their application set up quickly. With Protection Manager, a smart storage admin needs to spend an hour in so many different consoles just to set up replication. This is in contrast to SnapMirror which is "mirror to this volume using OnCommand System Manager -- done".

# ¿ Jul 23, 2012 21:57

madsushi: Apr 19, 2009; Baller.
#essereFerrari

BnT posted:

Can somebody explain to me how active/active SAN processors work on the SCSI level in enterprise SANs? Does one controller/processor control some disks, or do they share a bus somehow? Right now it's magic to me.

Each disk is only owned by one controller. FilerA may control disks 1,2,3 while FilerB controls disks 4,5,6. This used to be controlled by hardware, but is now usually controlled by software. That is to say: both FilerA and FilerB have paths to all of the disks, but you assign ownership at a software level. You can obviously give/take disks back and forth if needed, as long as they're not part of a RAID group or anything.

When a failure happens, the other controller seizes ownership. For example, if FilerA fails, FilerB will take control of those disks and will start serving out the data on those RAID groups, etc. It's very important that FilerB is running the same version of code as FilerA, so that it will understand all of the data/metadata on the drives it seizes. Once FilerA has recovered, FilerB can gracefully give control over those drives back to FilerA.

I believe the software ownership piece is a small block of data that is written at the front of the drive. If FilerA sees that FilerB has put its signature on the drive, it knows that it can't take them unless told to take them forcefully.

Because each controller has its own set of disks, the storage is not shared. If you give FilerA 10 disks and only give FilerB 5 disks, then FilerB is going to have a smaller capacity. On a system like a NetApp, FilerB has to have at least 3 disks, because the OS actually lives on the disk. I have deployed several active-active installations where FilerA has all but 3 of the disks, and FilerB only has its minimum of 3 disks. Essentially it's a single filer with redundant controllers at that point, which makes it easier to manage than two separate systems. You also get the advantage of dealing with one big storage pool vs trying to manage two different pools. Finally, you can do a lot of cool things like move volumes between RAID groups if all of the disk is owned by one controller, whereas if each owns half of the disks, you can't seamlessly transfer data between FilerA and FilerB.

madsushi fucked around with this message at 18:41 on Aug 12, 2012

# ¿ Aug 12, 2012 18:38

madsushi: Apr 19, 2009; Baller.
#essereFerrari

NippleFloss posted:

That's still active/passive (or, like 1000101 said, active/passive and passive/active at the same time). That's the situation in which you'd use ALUA to ensure that access is only happening on optimal paths under normal conditions. You don't want to use round robin in that scenario as the path through the cluster interconnects will still be slower than the paths directly to the LUN through the owning controller. Even if your interconnect had no latency and infinite bandwidth you still pay a latency penalty due to processing overhead on data being accessed through the suboptimal path.

What NippleFloss is trying to say is:

quote:

[hostname: scsitarget.partnerPath.misconfigured:error]: FCP Partner Path Misconfigured.

[hostname: scsitarget.partnerPath.misconfigured:error]: FCP Partner Path Misconfigured - Host I/O access through a non-primary and non-optimal path was detected.

Jesus, if I had a nickel for every AutoSupport I received for FCP Partner Path Misconfigured, I would retire.

# ¿ Aug 13, 2012 17:11

madsushi: Apr 19, 2009; Baller.
#essereFerrari

three posted:

Do other arrays require this, and specifically state this requirement?

NetApp's various host utilities (SnapDrive, VSC) will set these timeout values for you automatically.

# ¿ Aug 17, 2012 03:39

madsushi: Apr 19, 2009; Baller.
#essereFerrari

NetApp's DataMotion for Volumes is trash.

I was all excited about it, finally got all of my clients to upgrade to ONTAP 8+, but after actually trying to use it, here are a few... catches:

1) Doesn�t work with deduped volumes. Yes, seriously, after NetApp recommends that you turn on dedupe on EVERY volume, they pull this poo poo. I have heard Tom Georgens recommend on multiple times to enable dedupe everywhere! No penalities! The volume will be deswizzled/reinflated on the destination which means you will need to delete all of the snapshots to re-dedupe the volume at all.

2) Doesn�t work on CIFS/NFS-exported volumes. So, no VMware or CIFS/SMB volumes. You have to move these the old fashioned way, with plenty of downtime and SnapMirror.

3) Doesn�t work on SnapVault secondaries (still needs to be tested). When combined with the �no dedupe� rule, this will make it essentially worthless on any backup filer.

So if you have an iSCSI-only volume that doesn�t have dedupe enabled on your primary filer, well, now you have a good candidate. Unfortunately, with 99+% of my overall volumes being deduped and/or SnapVault destinations and/or CIFS/NFS, DataMotion is worthless.

Man, I am so mad. NetApp really needs to get their poo poo together w/r/t deduplication. It seems like most of their poo poo simply does not work with dedupe, and yet they want it on everywhere.

# ¿ Aug 25, 2012 18:17

madsushi: Apr 19, 2009; Baller.
#essereFerrari

Misogynist posted:

Does not compute.

Sorry, I should've said: "if you need to do a flash-cut, then you're stuck with snapmirror then a downtime-causing cut to the new volume."

If you have:
*VMware enterprise plus licensing
*weeks/months to transfer your data (and wait for replication)
*twice the space available on your SnapVault destination

Then yeah, you can make a new source volume and use sMotion to move a few VMs into the new source, wait for replication via SnapVault, move a few more, repeat, then wait several weeks/months before deleting the old source/destination volumes because you need to maintain those snapshots in case of a restore request.

# ¿ Aug 25, 2012 18:40

madsushi: Apr 19, 2009; Baller.
#essereFerrari

NippleFloss posted:

Vol move works fine with de-duplicated volumes. It just doesn't work when dedupe is actively running on them. You need to stop any ongoing scans before moving the volumes.

The fact that it only works with block protocols sucks. I haven't found any explanation for that but my guess is that they couldn't find an easy way to deal with stale file-handles since they basically have to un-export and re-export the filesystem as part of the cutover which NFS clients don't like.

It's probably possible, but most resources are devoted to cluster-mode these days.

I was basing the dedupe issue off of this quote:

DEDUPLICATION and COMPRESSION
*If deduplication is active on the source FlexVol volume, then it must be stopped for a successful cutover.
*DataMotion for Volumes does not move the fingerprint database and change logs of a deduplicated FlexVol volume. After the DataMotion for Volumes process is complete, users must execute sis start -s to rebuild the fingerprint DB at the destination.

"sis start -s" isn't going to work on a volume with snapshots, right? Or am I mistaken here?

# ¿ Aug 26, 2012 19:11

madsushi: Apr 19, 2009; Baller.
#essereFerrari

adorai posted:

Why wouldn't it?

sis can't touch blocks that are locked by snapshots, which is why you always have to run dedupe before your regular snapshots. Same for enabling dedupe on a volume with snapshots - dedupe can only touch new blocks, not the blocks already locked.

# ¿ Aug 26, 2012 20:43

madsushi: Apr 19, 2009; Baller.
#essereFerrari

Nomex posted:

You should check out the Netapp FAS2240 series. You can get them with 12 or 24 disks to start and 1-3 TB SATA disks. They support CIFS, NFS, iSCSI and FC, as well as AD integration and NDMP.

You're not going to see a FAS anywhere near $10k with that much storage.

Your best bet is a server with a decent raid card (HP/Dell) and a ton of disk and Windows. Windows is obviously going to give you AD auth, volume shadow copy, SMB/CIFS, etc, and it will be simple to manage (probably no training needed). If all you need is file shares, it's going to be very hard to beat a Windows server loaded with drives.

# ¿ Sep 8, 2012 18:59

madsushi: Apr 19, 2009; Baller.
#essereFerrari

sanchez posted:

This is fun until a RAID controller dies. I'd try to get a real SAN if at all possible, that's a lot of data to be sitting on a single server. If it was just a few tb it'd be a different story.

That's why I recommended buying HP/Dell, because at least you know parts are going to be available for a long time. If you can find a decent SAN for $10k that fits the criteria, then that's an option too, I just don't see that happening easily.

# ¿ Sep 8, 2012 23:02

madsushi: Apr 19, 2009; Baller.
#essereFerrari

Misogynist posted:

It appears you do not fully grasp what "single point of failure" means

But "single point of failure" was not in the original requirements. If you read the original request, what's needed is just a big NAS on a relatively tight budget.

# ¿ Sep 9, 2012 05:02

madsushi: Apr 19, 2009; Baller.
#essereFerrari

FISHMANPET posted:

So our Compellent SAN has to be setup by a Compellent engineer, and they sent us a survey to fill out beforehand. Under the iSCSI section it says this:

I've not seen anything about running two separate iSCSI networks, and as far as I can tell, a bunch of the virtual port stuff that Compellent does wouldn't work if interfaces were on multiple subnets. What are we supposed to be doing here?

That's right, that's how you are supposed to do iSCSI MPIO. You have two separate NICs on your host, two separate switches, and two separate NICs on your SAN for full redundancy. A lot of people skimp and just make two VLANs on one switch, or put the whole thing on one VLAN on one switch and just use different IP addresses.

# ¿ Sep 17, 2012 23:10

madsushi: Apr 19, 2009; Baller.
#essereFerrari

Here are my thoughts:

1) You shouldn't be setting the native VLAN on the Ciscos. The native VLAN is still 1. I don't have that value set on any of my ether-channel configs.

2) Make sure your encapsulation type is dot1q, I am not sure if your Ciscos are defaulting to ISL or whatever.

~~3) You want "routed on" if you want to be able to use routes, so turn that on for the NetApp.~~ Belay this order.

madsushi fucked around with this message at 18:48 on Sep 19, 2012

# ¿ Sep 19, 2012 18:37

madsushi: Apr 19, 2009; Baller.
#essereFerrari

bort posted:

Do you prune VLAN 1 on the trunks? I typically always set the untagged/native VLAN on a trunk. This is because VLAN 1 has all kinds of control traffic on it, unconfigured ports end up on it and older switches could drop VLAN 1 traffic to the processor, slowing everything down.

I'd agree with your original approach, bunnY. You could try another VLAN for your native VLAN to ensure that you're tagging the traffic on 731, but I, at least, stay away from VLAN 1.

Yeah, I use "sw tru all vlan x-y" to exclude VLAN 1 from hitting the trunk.

# ¿ Sep 20, 2012 00:51

madsushi: Apr 19, 2009; Baller.
#essereFerrari

Amandyke posted:

Not sure I'd bet my job on an ebay spare... The last thing you want is an incompatible firmware bringing down the array for who knows how long while you feverishly type, search and call your way to a fix after 24+ hours of unscheduled downtime during the week. That profit is what keeps folks like me in a job and new products rolling out the door.

No one ever got fired for letting a vendor do the work. CYA and all of that.

Wrong firmware is not the type of thing that happens with NetApp, though.

We use a lot of refurb/3rd party NetApp gear and it's great. The cost of a spare shelf is cheaper than the cost of maintenance on the shelf.

# ¿ Nov 3, 2012 07:50

madsushi: Apr 19, 2009; Baller.
#essereFerrari

Read his post.

quote:

The host servers I specced out at around $6k, so I could potentially put about $14k towards storage if I just do one host.

So it's $20k total.

# ¿ Nov 6, 2012 22:28

madsushi: Apr 19, 2009; Baller.
#essereFerrari

Misogynist posted:

Until you hit a firmware bug that trashes your filesystem and replicates the changes downwind.

That's why you use SnapVault, since it uses a separate file table/metadata!

# ¿ Nov 21, 2012 00:20

madsushi: Apr 19, 2009; Baller.
#essereFerrari

NippleFloss posted:

I'm not really sold on the need for an all-flash array for 99.9% of consolidated storage customers. Is anyone using this? For what? What workloads out there do you have that need 1 million random read IOPs with a total capacity of like 5TB, at a price multiple of many times more than a traditional storage array with 10 times the capacity?

This is an sincere question. SSD doesn't provide much benefit over HDD for throughput based workloads, if you already acknowledge writes in NVRAM then it won't accelerate writes, and most vendors have some sort of flash based read caching that serves hot blocks or segments from flash...so where does an all flash array fit?

I have been asking myself the same question. I think that your primary use-case is going to be someone with a relatively small but intensely used database that's too big to fit in a FlashCache card (multiple TB).

Essentially, when looking at your disk types, you're actually looking at a chart of IOPS/GB. Your 450GB 15K drive is going to give you like 0.38 IOPS/GB, while your 1TB 7.2K SATA drive will give you much less, around 0.05 IOPS/GB. SSD/Flash arrays are going to give you a massive IOPS/GB value, but that has to be compared against the IOPS/GB that the app/environment requires. The only time I see IOPS/GB needing to scale past 0.4 IOPS/GB, you're looking at intensely used databases.

# ¿ Nov 27, 2012 08:23

madsushi: Apr 19, 2009; Baller.
#essereFerrari

NippleFloss posted:

Nimble won't do the throughput he's looking for at the price/density he wants. Nimble's caching approach is great for random I/O where the working set is a small portion of the total data set (like e-mail and some OLTP). It's not very good for high throughput applications since the density per controller is pretty low and the your SSD cache layer does basically nothing and you're limited to the aggregated throughput of your SATA drives.

At the end of the day, Nimble is really just a shelf of SATA disk. It's got your normal NVRAM for write-caching and SSD for read-caching, but your consistent writes are limited to your SATA disks. If your reads fit well into a read-cache situation (Nimble, NetApp's FlashCache or Flash Pool, etc) then the SSDs will help your reads, but otherwise it's still just a shelf of SATA disk.

It's the same reason I'm always wary about Compellent: when the rubber meets the road, do you really want all of your production data on a small number of slow SATA disks? SAS disks are going to give you 6x the IOPS/GB of SATA.

# ¿ Dec 4, 2012 01:25

madsushi: Apr 19, 2009; Baller.
#essereFerrari

FISHMANPET posted:

But you can get Compellent with SAS

Many of the Compellent installs I see are with 6 SAS (15k) and 12 SAS 7.2k drives. My overall point was that even though there's a fast "tier" there, if you are doing anything substantial you are still limited by the 7.2k drives. That's why tiering is a dangerous game to play: because once you are outside of its capabilities, you are limited by the slower drives.

# ¿ Dec 4, 2012 02:11

madsushi: Apr 19, 2009; Baller.
#essereFerrari

adorai posted:

Which brings me to a nice question for nipplefloss: is it worth considering (or even possible at this point) to get a PAM for a 2050?

2050 is a dead box unfortunately, no ONTAP updates anymore.

In general, the 2xxx series doesn't have any PCI-E slots for expansion slots. So the 2020/2040/2050/2220/2240 can't support FlashCache (PAM). I don't think it has to do with the memory, it has to do with the fact that they don't have the right slot. In addition, there was an issue with some of the older 3xxx series that prevented FlashCache from working after you upgraded to 8+.

A 2240 isn't going to get you FlashCache, but it is going to be way faster than a 2050 is (in addition to all of the nifty 8+ features). 50% more RAM, faster CPUs, etc. I actually have a 2240 on my bench right now with a 10Gb card and it's very fast.

# ¿ Dec 6, 2012 08:28

madsushi: Apr 19, 2009; Baller.
#essereFerrari

NippleFloss posted:

Systems with 4GB of system memory or less just can't handle the additional memory pressure that the cache table adds. It would tank the performance of the box in a lot of cases and cause WAFL panics due to out of memory conditions if it got bad enough.

Good information, makes sense. One of the reasons I was excited about ONTAP Edge, the thought of loading a VM with 32GB of RAM and letting it go wild.

# ¿ Dec 6, 2012 09:11

madsushi: Apr 19, 2009; Baller.
#essereFerrari

I have a lot of VMware datastores that are sitting on NFS volumes on NetApp filers.

Sometimes, the snapshots from these datastores are reasonably sized, like 5-10GB per day. Sometimes, the snapshots from these datastores are unreasonably sized, like 90-100GB per day.

I am trying to find the best way to isolate the VMs that are causing the largest deltas. I have used the built-in VMware monitoring to looking for high write OPs VMs, but haven't found much correlation to snapshot size.

Are there any useful tools (on the VMware or the NetApp side) for monitoring individual VM writes? I just want to figure out which VMs are dumping SQL backups or running defrag and screwing up my replication windows. This is probably a good question for the VMware thread, but I love you all more.

e: and please don't suggest moving VMs around into temporary datastores, because I have a LOT of datastores to manage and since I have to wait 2 days between moving VMs to see if the snapshot size budged, it's really not feasible.

# ¿ Dec 12, 2012 07:33

madsushi: Apr 19, 2009; Baller.
#essereFerrari

Mr. Fossey posted:

We are looking to replace our aging and slow MSA2012i. Basically we have 7TB of rarely accessed engineering data, 300GB of exchange 2010 over 150 users, and 10VMs with very little usage. Our expected growth in the next 24mo is an additional 7TB of data with the data being a marginal candidate for compression but not a candidate for dedup.

Right now we are looking at a Netapp 2240 w/ 6x100GB SSD and 18x1TB SATA and SnapRestore. The idea is to let the netapp serve the bulk of the files over CIFS with the VMs connecting over iSCSI. Is this a) a reasonable build b) something manageable by non SAN folks c) The right tool for the job. Are there other vendors we should be looking at?

On a secondary note how important is the installation services. They are coming in at 30% of the cost of the hardware itself.

1) SSDs are going to be a waste for you. They won't help the engineering data, Exchange 2010 was designed to run on SATA disks and users won't notice, and low-usage VMs aren't important enough to warrant the cost.

2) I would look at bigger disks based on your use cases. You can get the 2240 with 2TB SATA disks. If you're going to do an HA configuration, you need to burn 3 disks (at least) for that, plus 1 spare and 2-4 DP on the main aggregate, would still net you over 30TB and you'd have exactly the same performance as with the 1TB SATA.

3) Use NFS for VMware, it will be easier to manage.

Essentially, there's a GB/IOPS scale for each disk type (1TB disks = 13 GB/IOPS, 2TB disks = 26 GB/IOPS, etc) and the fact is you sound like a pretty low IOPS shop as you describe it. You can easily run Exchange + CIFS + 10 VMs on 24 disks.

# ¿ Jan 23, 2013 05:11

madsushi: Apr 19, 2009; Baller.
#essereFerrari

Powdered Toast Man posted:

The biggest issue I've had with SME is that it seems if you make any sort of change to your mailbox server (change the name of a database, create a new database, move a database, etc), you have to run the configuration wizard again and you have to throw out your backup job and create a whole new one. If I'm mistaken about this, please enlighten me, but NetApp has a knowledgebase article that says pretty much exactly what I just said.

I think the idea is to set it up once and then forget it.

You can actually edit the Scheduled Task and just add in/edit the database name if you keep the syntax straight.

# ¿ Jan 28, 2013 17:45

madsushi: Apr 19, 2009; Baller.
#essereFerrari

Powdered Toast Man posted:

In theory you are supposed to be able to set it up and forget it. In practice, if you make any of the changes I mentioned, it breaks the backup job, even if it is only one DB out of several that doesn't work. It still breaks the entire job. The error is something about the path not matching when it attempts to do a VSS copy, and when I gave that information to NetApp they sent me to the aforementioned KB article...which says that you have to run the config wizard again, and if it is an existing DB and you changed the name, you have to move it to another LUN and then move it back. Yeah.

No, I meant you're supposed to set up Exchange once, and then forget it. I can count the number of times I've renamed an Exchange database on one hand. And, with Exchange 2010, and especially on a NetApp, you want to aim for fewer/bigger databases, so creating new ones should happen pretty rarely.

e: so I don't double-post:

I have been doing some NetApp vs Nimble comparisons lately, and it seems like there is one feature on Nimble that I don't quite understand. Nimble claims that their method of coalescing random writes into sequential stripes is somehow much faster than NetApp, and in fact Nimble claims that their write methods are up to 100x faster than others. I don't really see how this is possible. Can anyone with Nimble experience/knowledge add any insight?

madsushi fucked around with this message at 21:09 on Jan 28, 2013

# ¿ Jan 28, 2013 21:07

madsushi: Apr 19, 2009; Baller.
#essereFerrari

NippleFloss posted:

Of course, talking about how fast they can write to the disk on the back-end is a bit of a red herring. Writes on both Nimble and NetApp are acknowledged once they hit battery-backed NVRAM, long before they are ever sent to spinning disk. So it really only needs to write fast-enough to get to disk before the next time you need to flush your NVRAM. You might seem problems on very full, very aged NetApp filesystems, but for most users if they are seeing write delays it's going to be due to running out of controller headroom, running out of disk throughput, or doing a ton of misaligned I/O from VMWare.

Yeah, I'm familiar with the NetApp side, which is why I couldn't figure out how Nimble was fundamentally different, outside of the background sweeper (which we'll have to upgrade from 8.0.1 to 8.1 for).

We have a FAS2040 with 27 FC-attached SATA drives (DS14mk2 x2) in one aggregate. The aggregate is only about 40-50% full, and I ran volume-level reallocations on each volume when we added the 2nd shelf. No hot disks, idle disks, etc.

I can throw 25k IOPS at it, if we're talking about 4k writes, because it's all getting soaked up by NVRAM and then striped. What we're seeing is that when we run something, like say a month's worth of Windows Updates, on 10 servers on the SATA aggregate, we're getting slammed with back-to-back CPs and high-water mark CPs and the write latency on both our SAS and SATA aggregates skyrockets into triple digits. If I turn on IOMeter and set it to 8MB writes to SATA, I see the same effect. Running that test on SAS barely touches our latency.

So, our team has been having to run Windows Updates on 2-3 servers at a time, and it's taking them forever to get through a maintenance period. I think there has been some overlap between the maintenance window, dedupe jobs, and our SnapVault replication from the SATA aggregate, and so I think that is what is causing our problems. I am going to do some more troubleshooting during our next window, because our logging tool always stops recording SNMP data during the slowdowns so we've been without much good data. I am hoping that we can fix it just by adjusting scheduling.

My boss, however, is convinced that NetApp is handling flash/SATA poorly and that their solutions aren't eloquent, and looking into getting a pair of Nimbles and adding them to the mix, removing all of the NetApp SATA shelves and using the Nimble for bulk SATA storage instead. My issue is that I really don't see how a CS240 differs from a 2240 with a FlashPool (which would be another purchase option). It seems like both companies made all the same technology choices.

They both soak random writes to NVRAM.
They both serve random reads from SSD.
They both serve seq reads from disk.
They both write seq writes to disk, as fast as 8x 7.2k RPM drives will let you.

In fact, the Nimble only has 8 data drives, while a NetApp would have at least 12 in the primary aggregate. With the free space sweeper in 8.1, what does Nimble bring to the table? It seems like a really slick NetApp clone, with pros (no licensing poo poo to deal with) and cons (iSCSI only, SATA only); but no "killer app" that would make me want to switch. And, I'm still not sure I trust compression, and it's not like NetApp doesn't have compression too. Nimble has even said that their box, with it's 8 drives, could replace our whole 2040, which has 27 SATA and 32 SAS drives. I just don't see how it could ever match the back-end throughput when the rubber meets the road and we need to write a lot of data (storage vMotion, big file transfers, etc).

madsushi fucked around with this message at 20:22 on Jan 30, 2013

# ¿ Jan 30, 2013 20:19

madsushi: Apr 19, 2009; Baller.
#essereFerrari

NippleFloss posted:

Have you opened a case with NetApp to have them look at performance during the updates? If you take a perfstat during one of them I would be happy to look at it and see if anything jumps out, though usually a B2B CP is just a case of overloading either the disk or the loop (are these external or internal, SAS or FC?).

I haven't opened a case yet, because someone else had been working the issue up until this point because I had been too busy. I plan to have perfstat and sysstat data from the next update window, along with our own SNMP monitoring from the NetApp and the VMware hosts. The disks are in DS14mk2s, so they're external FC-connected SATA disks.

NippleFloss posted:

One problem with the way NetApp does writes is that each controller has only a CP process for all writes, whether they are going to SATA, SAS, or SSD. So if you saturate your SATA disks writes to all other disks will suffer as well because they are caught in the CP bottleneck that the SATA is causing. Nimble doesn't have this issue because nimble only does SATA.

Exactly, which is why our SAS write latency goes up during high SATA usage. I like the Active/Passive design style with the smaller NetApps, since it ensures you'll never outgrow your N+1 and makes it easier to manage. However, in this case, looking back, we probably should've put the SATA loop on the second controller (which is currently just holding on to 3 SAS disks and sitting idle) if only to isolate the CP congestion to our SATA volumes (which are non-customer facing/impacting).

NippleFloss posted:

So yea, Nimble has some definite advantages due to being new to the game. They've been able to learn from the growing pains of WAFL and likely avoided some of those problems as they wrote their code. But that doesn't make it a better storage product and you're correct that in the end the question is "can I write this much data out to this many spindles in this amount of time?". I'd be inclined to say no, not with 8 drives. We have plenty of NetApp's running on SATA here and they do just fine up until the point where disk utilization gets above 70% or so. It's not an issue of how well the controller or software handles SATA, it's just the fact that SATA has a steep latency curve as utilization goes up and utilization will go up if you add more load.

This was my feeling as well, since it's not like our workload ever has issues on SATA outside of very high usage. The problem has just been high profile since it's been affecting all of our VMs. My guess is that it's just storage congestion due to dedupe / SnapVault / normal traffic / Windows Updates all trying to run at the same time, and the SATA can't handle it. Unfortunately we can't move the SATA aggregate to the other head without having a temporary shelf on the other head to move things to. I had even thought about just unplugging the SATA FC loop and plugging it in to the other head to see if it showed up, but that gets pretty risky.

# ¿ Jan 30, 2013 22:39

Adbot: ADBOT LOVES YOU

# ¿ Apr 19, 2024 01:48

madsushi: Apr 19, 2009; Baller.
#essereFerrari

NippleFloss posted:

Why isn't the SATA look connected to the other head anyway, if they're in a cluster? They should both be able to see all disks. Swinging aggregates between controllers is pretty simple really, you just need to un-assign and re-assign all disks from the aggregate and it will show up as foreign, then you import it. I've moved entire shelves to move aggregates before without issue, and on a cluster where both controllers should see all disks it's as simple as an un-assign/re-assign (this is actually how storage failover works in clustered ontap, and it's pretty neat and much faster than CFO on 7-mode).

The SATA disks are technically connected to both heads, but we were just thinking we'd need to connect another SATA shelf to migrate to first. I didn't realize you could move aggregates between controllers just by using unassign/reassign. I would just offline the volumes/aggregate, unassign from one head, and assign to the other? And the aggregate just shows up? I will see if I can find a TR or something explaining the process. That would save me a bunch of work/effort.

# ¿ Jan 30, 2013 23:36

The Something Awful Forums > Discussion > Serious Hardware/Software Crap > Enterprise Storage Megathread: Why is my NAS a SAN?

«‹›4 »