Enterprise Storage Megathread: Why is my NAS a SAN?

The Something Awful Forums > Discussion > Serious Hardware/Software Crap > Enterprise Storage Megathread: Why is my NAS a SAN?

chutwig: May 28, 2001; BURLAP SATCHEL OF CRACKERJACKS

bull3964 posted:

Do you have any more details? How long ago, did they identify the bug, total data loss, models, scale of implementation?

One of the things that attracting us to them right now is one of our partners (which happens to be driving our data needs right now) currently use them.

I mean, I'm sure someone has a bad story about every product that's out there (I remember DSLReports badmouthing equallogic when an array crashed). This is just a pretty sizable purchase for us and if this implementation is anything but a hail mary, there will be blood.

We use PureStorage right now and were an early beta site (I came on shortly after the product went GA). It's actually been quite stable for us; our main issue with Pure is that they want to be the Apple of enterprise SSD arrays and they tend to treat their customers like that. Oh, you have a problem? Just upgrade to Purity x.y.z! is not really an acceptable answer in the enterprise storage world. Other than that, we've had a few hiccups where one of the controllers at our backup data center crashed, and various issues with the way they calculate space utilization translating into Purity severely throttling latency on the array and us not being able to figure out why because we were at an indicated 59% used out of 10 TB, rather than the actual 102% out of 8 TB that we really were.

# ¿ Jul 25, 2014 01:01

Adbot: ADBOT LOVES YOU

# ¿ Apr 25, 2024 11:23

chutwig: May 28, 2001; BURLAP SATCHEL OF CRACKERJACKS

NippleFloss posted:

I'd be interested to hear how your experiences with failover have been. Has the process been non-disruptive for applications?

Controlled failover during Purity upgrades has always been unnoticeable from an application standpoint. Things got a little wacky when one of the controllers crashed once because of a bug in Purity that caused the target controller to refuse to take over the failed controller's disks immediately, but that was kind of an exceptional circumstance.

bull3964 posted:

Our primary use case of this initially is for 3tb of MongoDB data. It's super compressible data, but there's no native solution for it right now so inline compression on the storage seems like the way to go. It's a replica set of 3 nodes, so the dedupe is very attractive as well. At least on paper, it seems like we could not only greatly shrink the size of a single datastore but also have a great reduction in total data size due to the nodes deduping.

Our principal use case is it hosts a pre-compressed 4+TB PostgreSQL database. There are also two replicas of this database which compress/dedupe a lot less than I'd expect given that PostgreSQL replication is block-level rather than logical, to the point where the replica volumes sometimes exceed the size of the master volume. I don't know whether it's because the database actually is deduped completely and the UI is distributing the complete size of a single copy of the database among the three volumes, or whether it's just not actually that great at dedupe under certain circumstances. I do know that once a month I usually do housekeeping on the replicas by deleting their data volumes and re-cloning from the master, and doing so usually frees up several hundred gigabytes on the array that gradually gets used up again over the course of the month.

tl;dr: sometimes the dedupe doesn't act like you expect but we still have about 31TB provisioned that's deduped down to 5TB, so overall it works pretty well.

chutwig fucked around with this message at 20:04 on Jul 26, 2014

# ¿ Jul 26, 2014 19:58

chutwig: May 28, 2001; BURLAP SATCHEL OF CRACKERJACKS

Anyone here use Tegile? We are looking to consolidate our Pure Storage + NetApp installations, and on paper the T3400/T3800 have everything we're looking for in a single platform: inline dedupe+compression, replication, NFS. We'll likely arrange to get a test unit to evaluate, but does anyone have any good/bad things to say about Tegile in particular?

# ¿ Oct 1, 2014 06:06

chutwig: May 28, 2001; BURLAP SATCHEL OF CRACKERJACKS

Kaddish posted:

I'm seriously considering consolidating our entire VMware environment (about 45TB) to a Pure FA-420. I can get 60TB usable (assuming 5:1 compression) for about 240k. Anyone have any first hand experience? It seems like a solid product and perfect for VMware.

This is some reply necromancy, but we just replaced our FA-320s with FA-420s and added another 12TB shelf to take us to 23TB raw capacity. We do see compression ratios of around 6:1 over the entire array, but I don't recall what it is on the VMFS LUNs specifically.

# ¿ Nov 15, 2014 04:32

chutwig: May 28, 2001; BURLAP SATCHEL OF CRACKERJACKS

As a longtime Pure customer, here are my positives and negatives about them.

Positives

If you make enough noise, they will pay attention. After a particularly unfortunate incident with a huge bug in Purity (no data loss but a pretty gross performance impact), we ended up with a dedicated engineer who personally handles everything but routine support stuff, and last month they sent over their chief architect and a couple of other high-level engineers to lay out their roadmap for Purity 4 and beyond. We also had an incident recently that required an extremely rapid upgrade of our controllers plus another shelf and within about 2 days we had the new controllers and shelf on-site, installed and running.
Aside from the aforementioned highly regrettable bug encounter, which I will detail below, Purity has been stable and has not given me too many reasons to notice it, which is the most you can ask for from a storage system.
The dedupe+compression are pretty magical and we typically see ratios of 6:1 or better over the full array.

Negatives

They really really really want(ed) to be the Apple of enterprise storage, by which I mean their gameplan was to sell you a box of SSDs, manage it for you, and you would only ever need to interact with it to make LUNs and add WWNs or iSCSI targets. Obviously this is a pipe dream and I have done my utmost to disabuse them of the notion that this is a feasible strategy in the realm of enterprise storage. poo poo goes wrong, poo poo acts weird, and responding every time with "upgrade to the latest version of Purity and then we'll start actually diagnosing the issue regardless of whether there's anything in the newer Purity that might actually address the problem" sits poorly with an audience that prioritizes predictability and stability above all else.
They manage it for you and you don't get to. If you're used to putting on the shoulder-length glove and digging around in the bowels of Data ONTAP like I have to do with our NetApps occasionally, it will come as an unpleasant shock. Just about everything requires a support ticket and a support connection from the array. Maybe this doesn't bother you and there are probably customers of theirs that love this, but it irritates me still.
They are very late to the NAS game. They recognize that they need to provide NFS/CIFS at some point but have no idea when, and their current best recommendation is to front a LUN with Windows Server 2012 for both CIFS and NFSv4 (and here I thought Microsoft had killed Services for UNIX).
They are also late to the replication game. Purity 4 has some rudimentary replication that I think operates on a 15-minute fixed schedule and then you need to manually copy the snapshot LUN to form a mountable LUN, which is obviously a far cry from SnapMirror or Tegile's ZFS-based replication. They know it's not great and are working on it but I don't know that it will ever be able to fully replace what SnapMirror currently provides for me.
The incident with the rapid controller replacement I referenced above: if you start to exceed the capabilities of the controllers for even a little while, you can quickly enter a death spiral where the dedupe/compression processes can't keep up, the system partition where pre-deduped data is kept blows up like Mr Creosote, you exceed 100% utilization on the user space and start intruding into reserved space, the array begins aggressively latency throttling you to try to get you to remove data, and then it's just game over. Even with our upgraded controllers we still had to have their support make the dedupe/compression jobs more aggressive in order to keep up, and we do not push the array very hard. I am currently evaluating our LUN layout to see if we can make some changes there that will help the dedupe processes; our engineer recommended avoiding LUNs larger than 2TB where possible, so I'm going to try to slice up some of our larger VMFS LUNs.

The negatives list looks pretty ominous, but the main salient point to take away is that 80% is the magic number. Don't exceed 80% usage, even after they recalibrated the measurement so that the old 80% became the new 100%. 80% in old versions of Purity was the point at which you would begin occupying reserved system space and the controller would start throttling like hell. Somewhere around Purity 3.2 or 3.3.something, they recalibrated that to the new 100% and hid the remaining space from view. You can still technically intrude into it and see things like 102% utilization, but the array will not be happy with you. That is the Purity bug that bit us; we started getting throttled very aggressively at an indicated 59% of our 11TB array. Support immediately told us that we should upgrade Purity but wouldn't tell us why, and then as soon as we upgraded to a version of Purity where the utilization percentage was recalibrated, it turned out that we were actually at 102% of 8.8TB usable user space and had been right on the edge for some time. That led to a lot of angry discussions, but ultimately they've become more forthcoming and we haven't jumped ship to Tegile yet, sooooo...

Ultimate takeaway from me is basically what NippleFloss said; it's a block device SAN that's fast but doesn't really have wow features. All the same, I'm happy to trade good wows for bad wows when it comes to storage, and they're making progress towards getting rid of the bad wows.

chutwig fucked around with this message at 06:48 on Dec 31, 2014

# ¿ Dec 31, 2014 06:45

The Something Awful Forums > Discussion > Serious Hardware/Software Crap > Enterprise Storage Megathread: Why is my NAS a SAN?