Enterprise Storage Megathread: Why is my NAS a SAN?

The Something Awful Forums > Discussion > Serious Hardware/Software Crap > Enterprise Storage Megathread: Why is my NAS a SAN?

«‹›207 »

Amandyke: Nov 27, 2004; A wha?

doomisland posted:

We just got an Isislon cluster and I was going to be there while the tech would set it up in the rack (this is at a remote site across the country). They shipped the wrong Infiniband cables so not only did they overnight more cables they also are paying for me staying out longer than I needed to. The new cables were the same ones and were wrong again. Is this typical?

Did they ship ones that were too long or too short? I may be going in for an interview at isilon tomorrow and would happily look into things for you.

# ? Apr 11, 2013 06:41

Adbot: ADBOT LOVES YOU

# ? Apr 26, 2024 16:21

Vulture Culture: Jul 14, 2003; I was never enjoying it. I only eat it for the nutrients.

doomisland posted:

We just got an Isislon cluster and I was going to be there while the tech would set it up in the rack (this is at a remote site across the country). They shipped the wrong Infiniband cables so not only did they overnight more cables they also are paying for me staying out longer than I needed to. The new cables were the same ones and were wrong again. Is this typical?

It happens with vendors sometimes. EMC/Isilon doesn't do their own global logistics (most vendors don't), they outsource it to Unisys. All kinds of crazy poo poo happens with them, or any logistics company.

I had an issue with IBM once where they sent the wrong model DIMM for one of our SONAS interface nodes. Later that month, they did the exact same thing, sending us the exact same wrong replacement for the exact same failed DIMM. Took like 4 hours to get a replacement each time, but still.

# ? Apr 11, 2013 14:06

three: Aug 9, 2007; i fantasize about ndamukong suh licking my doodoo hole

I think Unisys hires hobos and turns them into technicians.

# ? Apr 11, 2013 14:17

doomisland: Oct 5, 2004

Amandyke posted:

Did they ship ones that were too long or too short? I may be going in for an interview at isilon tomorrow and would happily look into things for you.

They were the wrong connector at one end. I'm not familiar with Infiniband but it was basically one end was what I assume is the normal connector and the other was the connector that needs to be inserted into the switch like you would an SFP. The cables the sent were both normal connectors at each end.

# ? Apr 11, 2013 14:34

Vulture Culture: Jul 14, 2003; I was never enjoying it. I only eat it for the nutrients.

doomisland posted:

They were the wrong connector at one end. I'm not familiar with Infiniband but it was basically one end was what I assume is the normal connector and the other was the connector that needs to be inserted into the switch like you would an SFP. The cables the sent were both normal connectors at each end.

Why on earth are they using QSFP to CX4 now instead of using the same interconnect on both ends :confused:

# ? Apr 11, 2013 15:14

Dilbert As FUCK: Sep 8, 2007; by Cowcaster; Pillbug

Anyone here using 3Par? I'm :allears:

to people who have experience with it. Mostly looking for complaints or impressions with the equipment, my new job mentioned they work a lot with 3Par, and are looking at nimble as well.

# ? Apr 11, 2013 15:36

doomisland: Oct 5, 2004

Misogynist posted:

Why on earth are they using QSFP to CX4 now instead of using the same interconnect on both ends

No idea, I wasn't involved in the decision/purchase/planning of it. I was going to the site anyways for stuff so they shipped off the boxes as soon as they came in for me to install with the Unisys dude. We have the correct cables at work now and yeah they're QSFP to CX4. We were sent CX4 CX4 twice~

# ? Apr 11, 2013 15:46

madsushi: Apr 19, 2009; Baller.
#essereFerrari

Corvettefisher posted:

Anyone here using 3Par? I'm to people who have experience with it. Mostly looking for complaints or impressions with the equipment, my new job mentioned they work a lot with 3Par, and are looking at nimble as well.

Depends on the unit. I actually just wrote an email to a client that was looking at 3PAR and Nimble. My experience with the 3PAR 7-series is that it's mostly just dumb disk in a RAID with snapshots, no real features that I'd buy it for. I hear that there's some scale-out capabilities with the bigger boxes, but I don't have any experience with that. If I were buying a small-to-medium sized SAN, I'd look at Nimble or NetApp based on the requirements. But, some clients just want some dumb disk on the back-end, and 3PAR does that pretty well. I also HATE tiering and I am glad that the industry is moving away from it.

I have a hard time looking at SANs that don't include a SSD/Flash-based read cache these days. The 3PAR (and Compellent, and etc) tiering isn't real-time and isn't going to get you anywhere near the same performance boost.

# ? Apr 11, 2013 16:04

Syano: Jul 13, 2005

People on the low end should always include dell in their discussions. Their powervault line is getting pretty darn good for cheap

# ? Apr 11, 2013 16:07

hackedaccount: Sep 28, 2009

madsushi posted:

I also HATE tiering and I am glad that the industry is moving away from it.

I have a hard time looking at SANs that don't include a SSD/Flash-based read cache these days. The 3PAR (and Compellent, and etc) tiering isn't real-time and isn't going to get you anywhere near the same performance boost.

What's the context of the first line there? The traditional cache -> flash -> fast HD -> slow HD -> tape type tiering or something else? What's up with the industry moving away from it?

What do you mean about how the 3PAR tiering isn't "real time"?

# ? Apr 11, 2013 16:43

madsushi: Apr 19, 2009; Baller.
#essereFerrari

hackedaccount posted:

What's the context of the first line there? The traditional cache -> flash -> fast HD -> slow HD -> tape type tiering or something else? What's up with the industry moving away from it?

What do you mean about how the 3PAR tiering isn't "real time"?

Tiering, in my mind, is when you're talking about moving your data between disk speed tiers (SATA -> SAS -> SSD). The quintessential tiering story was Compellent. The idea was that you buy a handful of fast disk for your "hot data" and a lot of slow disk for your "cold data" and you let their magic decide where the bits should go. The problem with the approach (both 3PAR's and Compellent's) involves scheduled tiering. So, the system figures out all the hot blocks during the day, then moves those hot blocks to your fast disk at night, because it impacts performance. If you have a different set of hot blocks tomorrow, then your fast disk is wasted. The problem is that a 24-hour window of discovering and migrating hot blocks is too slow for many applications and workloads. Plus, the sizing is often done with some very wishful thinking in mind, like 5 SAS disks trying to support 50 SATA disks. Also, I like knowing exactly where my data ACTUALLY is at any time.

NetApp and Nimble do things differently. Your data, at all times, is on your disk. The read caching layer is just a cache and your data is never there permanently. The read cache is populated in real time (and even during writes sometimes) and is intelligently refreshed and managed. If you read the same block twice, the second read will be cached. I find that your cache hit ratio with this model is far, far higher than anything I've ever seen from a tiered storage model like Compellent. You get a larger benefit out of the read cache, and you don't have to worry about things like "did I buy enough fast disks?" because you're just throwing a big fuckall SSD or PCI-E Flash card at it. Five SAS disks aren't going to outperform 50 SATA disks if your data is laid out properly in the first place.

The tiered storage model was beautiful for about 6 months, which is when Compellent went big and Dell spent $$$ on them. Then, SSD and Flash hit reasonable levels and Compellent has been withering since.

# ? Apr 11, 2013 18:14

hackedaccount: Sep 28, 2009

Ok gotcha. So it isn't an issue with automated storage tiering itself, it's that some vendors don't monitor the data in real-time and because they move the data to different tiers once per day it's super inefficient vs real-time tiering.

# ? Apr 11, 2013 19:16

YOLOsubmarine: Oct 19, 2004; When asked which Pokemon he evolved into, Kamara pauses.

"Motherfucking, what's that big dragon shit? That orange motherfucker. Charizard."

hackedaccount posted:

Ok gotcha. So it isn't an issue with automated storage tiering itself, it's that some vendors don't monitor the data in real-time and because they move the data to different tiers once per day it's super inefficient vs real-time tiering.

Eh, automated storage tiering is really never going to be real time because it's too intrusive to disk activity. Every time you move a block you have to read it from one tier write it to another tier and then delete it from the original tier. When moving up a tier that isn't as bad because you're likely already read the block if it is hot data, but you still have a write/delete cycle to process that ties up a disk subsystem that should be busy serving data, not moving data. If you're destaging from the fast tier then it's likely because it is cold data which means you'd need to maintain a real time heat map of all data in your cold tier and constantly read/write/delete cold block, again, tying up your fast disk tier with accesses that are competing with user access but aren't actually getting things there any faster. It's just way too inefficient do do in real time, which is why these systems generally work by having the system pre-stage data to the hot tier before it is actually accessed. That's pretty clumsy though, and only works if your access patterns are incredibly predictable and non-overlapping.

# ? Apr 11, 2013 20:19

Vulture Culture: Jul 14, 2003; I was never enjoying it. I only eat it for the nutrients.

NippleFloss posted:

Eh, automated storage tiering is really never going to be real time because it's too intrusive to disk activity. Every time you move a block you have to read it from one tier write it to another tier and then delete it from the original tier. When moving up a tier that isn't as bad because you're likely already read the block if it is hot data, but you still have a write/delete cycle to process that ties up a disk subsystem that should be busy serving data, not moving data. If you're destaging from the fast tier then it's likely because it is cold data which means you'd need to maintain a real time heat map of all data in your cold tier and constantly read/write/delete cold block, again, tying up your fast disk tier with accesses that are competing with user access but aren't actually getting things there any faster. It's just way too inefficient do do in real time, which is why these systems generally work by having the system pre-stage data to the hot tier before it is actually accessed. That's pretty clumsy though, and only works if your access patterns are incredibly predictable and non-overlapping.

Eh, this is kind of a cop-out. This is exactly why QoS exists and why every storage vendor in the world has a sliding scale that allow administrators to prioritize foreground user traffic versus background administrative stuff. Even my bargain-basement IBM DS SANs have a knob that allows you to determine the priority given to array resizes and things of that nature.

I get that it's hard, and nobody does it with tiering, but to say it's never going to happen is extraordinarily naive and short-sighted, I think.

Vulture Culture fucked around with this message at 21:03 on Apr 11, 2013

# ? Apr 11, 2013 20:58

madsushi: Apr 19, 2009; Baller.
#essereFerrari

Misogynist posted:

Eh, this is kind of a cop-out. This is exactly why QoS exists and why every storage vendor in the world has a sliding scale that allow administrators to prioritize foreground user traffic versus background administrative stuff. Even my bargain-basement IBM DS SANs have a knob that allows you to determine the priority given to array resizes and things of that nature.

I get that it's hard, and nobody does it with tiering, but to say it's never going to happen is extraordinarily naive and short-sighted, I think.

I just see it as wasted IOPS right now. Why bother moving data between SAS/SATA when you can just buy a big flash/SSD cache and do it that way? And if you're at the scale where flash/SSD don't give you enough space, you probably want all SAS anyway.

Why do companies buy SATA? Because the $/GB is cheaper than SAS. At some point, it's probably easier to manage what data goes to SAS and what goes to SATA manually, rather than trying to play the guessing/prediction game. We might get to the point where you can get REALLY good at guessing, but it will still not be as good as a flash/SSD solution. When SAS was orders of magnitude cheaper per GB than SSD, it made sense.

# ? Apr 11, 2013 21:14

Vulture Culture: Jul 14, 2003; I was never enjoying it. I only eat it for the nutrients.

madsushi posted:

Why do companies buy SATA? Because the $/GB is cheaper than SAS. At some point, it's probably easier to manage what data goes to SAS and what goes to SATA manually, rather than trying to play the guessing/prediction game. We might get to the point where you can get REALLY good at guessing, but it will still not be as good as a flash/SSD solution. When SAS was orders of magnitude cheaper per GB than SSD, it made sense.

We barely buy NLSAS as it is. We have a big demand for archival HSM storage on LTO (1/4 the price of NLSAS), and a big demand for fast storage for high-performance computing, but we're seeing a huge depletion of everything in the middle.

# ? Apr 11, 2013 21:16

YOLOsubmarine: Oct 19, 2004; When asked which Pokemon he evolved into, Kamara pauses.

"Motherfucking, what's that big dragon shit? That orange motherfucker. Charizard."

Misogynist posted:

Eh, this is kind of a cop-out. This is exactly why QoS exists and why every storage vendor in the world has a sliding scale that allow administrators to prioritize foreground user traffic versus background administrative stuff. Even my bargain-basement IBM DS SANs have a knob that allows you to determine the priority given to array resizes and things of that nature.

I get that it's hard, and nobody does it with tiering, but to say it's never going to happen is extraordinarily naive and short-sighted, I think.

The problem with using a throttle mechanism to protect user IO is that you'd WANT your automated tiering to kick in in the exact situation where a throttle would prevent it from happening. If I have a big IO dump hit something on slow disk that's when I'd hope my automated tiering would protect me by ensuring that it is on my fast tier where performance isn't abysmal, but because the slow disk is so busy serving user IO there is no guarantee that you'll have enough free cycles to actually perform the tiering operation. I may complete once disk IO dies down, but does that really do you any good? You can EITHER use the disk IO for tiering or you can use them to server user data, but if your system is hit hard enough that you want them on fast disk then you probably don't have enough spare IO to perform the tiering.

Could it be made to work, kinda-sorta, in some circumstances? Sure. Is it worth the development time when huge, fast cache is rapidly becoming available? Not really.

I could certainly be proven wrong, but it seems like with the rush to flash the storage industry is largely passing these sorts of tiering mechanisms by, like madsushi said.

# ? Apr 11, 2013 21:25

Vulture Culture: Jul 14, 2003; I was never enjoying it. I only eat it for the nutrients.

NippleFloss posted:

The problem with using a throttle mechanism to protect user IO is that you'd WANT your automated tiering to kick in in the exact situation where a throttle would prevent it from happening. If I have a big IO dump hit something on slow disk that's when I'd hope my automated tiering would protect me by ensuring that it is on my fast tier where performance isn't abysmal, but because the slow disk is so busy serving user IO there is no guarantee that you'll have enough free cycles to actually perform the tiering operation. I may complete once disk IO dies down, but does that really do you any good? You can EITHER use the disk IO for tiering or you can use them to server user data, but if your system is hit hard enough that you want them on fast disk then you probably don't have enough spare IO to perform the tiering.

Could it be made to work, kinda-sorta, in some circumstances? Sure. Is it worth the development time when huge, fast cache is rapidly becoming available? Not really.

I could certainly be proven wrong, but it seems like with the rush to flash the storage industry is largely passing these sorts of tiering mechanisms by, like madsushi said.

The same principles apply to flash as they do to higher-performance spinning disk. Vendors like to pretend flash is special because flash is really expensive. When flash is not so expensive, we'll be right back to where we are today in the context of this discussion.

Why couldn't it be made to write the data inline on a higher tier as it's being read off of the lower tier already?

# ? Apr 11, 2013 21:34

1000101: May 14, 2003; BIRTHDAY BIRTHDAY BIRTHDAY BIRTHDAY BIRTHDAY BIRTHDAY FRUITCAKE!

madsushi posted:

I have a hard time looking at SANs that don't include a SSD/Flash-based read cache these days. The 3PAR (and Compellent, and etc) tiering isn't real-time and isn't going to get you anywhere near the same performance boost.

It's not realtime but you can default writes to land in the highest tier and eventually move down as they data gets cold. Most data tends to be that way anyway and when you're talking north of 100TB of storage it becomes cost prohibitive to put everything on flash or even 15k RPM SAS. Tiering in some form or another will always be here until we start seeing people being amazed that we used to store data on spinning ceramic platters.

I'd suspect what we'll actually see in the future is something more along the lines of policy based data management built into the array. This application always lives here and that application is archival so put it on cheap lovely disks. A sort of "manual" control that can be managed transparently to end hosts.

Either that or scale-out storage just becomes the flavor of the week (something like this http://www.yellow-bricks.com/2012/09/04/inf-sto2192-tech-preview-of-vcloud-distributed-storage/.) One big happy tier capable of delivering as much IO and bandwidth as you need.

# ? Apr 11, 2013 21:53

KS: Jun 10, 2003; Outrageous Lumpwad

I read the negative descriptions of tiered disk in this thread and wonder, because the reality for us has been so different over the last 2 years we've had Compellent arrays. We bought in just a few weeks before Dell bought the company.

Our biggest system is around 64TB usable, made up of 60 15k 3.5" disks and only 24 7k 3.5" disks. It's essentially sized so that live data stays on 15k and 7k is used for replays. Our peak period is about 2 hours of 475 MB/sec transfers and 9000 IOPS. The array handles the load just fine.

Maybe there are partners out there that do sizing differently, and I could definitely see a disaster if sized improperly, but it's worked out well for us. All writes go to the fast disk, period. The tiering works outside of peak hours, and the system has been solid. The ecosystem (Replay Manager, powershell scripting, etc) is the best I've worked with.

At the same time, I'm a NetApp fan from way back. In 2012 I tried really hard to replace both of our arrays with NetApps during a planned DC move. Netapp has the advantage of a flash cache, but the lack of tiering seems like a disadvantage -- they quoted a system with 144 15k drives to meet the combination of size and throughput requirement, and obviously couldn't come close on price.

From my experience, tiering has few drawbacks if sized right. It's cheaper than an all-15k system, and it doesn't require any additional management. That's not a bad thing.

Now would I get a Compellent again today? Absolutely not. The fact that a system they were selling in 2011 doesn't have VAAI support is either an embarrassment or a bad joke. The benefits of flash from a latency perspective are too big to ignore, and Compellent has lagged behind in that department.

KS fucked around with this message at 22:55 on Apr 11, 2013

# ? Apr 11, 2013 22:53

FISHMANPET: Mar 3, 2007; Sweet 'N Sour
Can't
Melt
Steel Beams

They support VAAI now. I don't know what parts exactly as I don't work with the SAN but I know it supports at least some of the APIs.

# ? Apr 11, 2013 22:57

KS: Jun 10, 2003; Outrageous Lumpwad

Right, but only in the 6.x code branch, which is only on the series 40 and 8000 controllers. Series 30 wasn't EOL until like mid-2011. They've been promising a 5.6 branch with VAAI for the series 30s for at least a year.

# ? Apr 11, 2013 23:11

adorai: Nov 2, 2002; 10/27/04 Never forget; Grimey Drawer

I am reporting in to say that I am extremely happy with our decision to augment our netapp storage with an oracle zfs appliance.

# ? Apr 11, 2013 23:19

YOLOsubmarine: Oct 19, 2004; When asked which Pokemon he evolved into, Kamara pauses.

"Motherfucking, what's that big dragon shit? That orange motherfucker. Charizard."

Misogynist posted:

The same principles apply to flash as they do to higher-performance spinning disk. Vendors like to pretend flash is special because flash is really expensive. When flash is not so expensive, we'll be right back to where we are today in the context of this discussion.

Why couldn't it be made to write the data inline on a higher tier as it's being read off of the lower tier already?

You could write it inline as it's being read from the lower tier, but presumably you're also busy destaging stuff from the higher tier to make room, since space on the higher tier must be constrained, otherwise you would run everything there all the time. Which means that you are adding load to the high tier as you try to read to destage while you are also attempting to write to it, and you are possibly adding load to the low tier to write that destaged data as you also try to read it to service user reads. It's extra IO any way you slice it and you've got to fit it in somewhere.

Regarding flash, it is sort of special in the sense that it doesn't behave like spinning disk. It doesn't care if your IO is random or sequential, and it doesn't care if your data is contiguous or not, which provides a lot more flexibility regarding how it's used by the filesystem. Its overall latency is hugely lower and the latency curve is better behaved. I mean, I'm sure "how can I make it faster" will never stop being a discussion point but a handful of flash disks provide enough theoretical random IO to support just about any normal workload so we are a long ways off from talking about how to get around our slow slow SSDs.

1000101 posted:

It's not realtime but you can default writes to land in the highest tier and eventually move down as they data gets cold. Most data tends to be that way anyway and when you're talking north of 100TB of storage it becomes cost prohibitive to put everything on flash or even 15k RPM SAS. Tiering in some form or another will always be here until we start seeing people being amazed that we used to store data on spinning ceramic platters.

This is basically what vendors do right now with battery backed NVRam and things of that nature. It's a write cache, but the problem with write cache is that eventually it's all going to hit the back end disk. If you have a very bursty workload then it works okay because you can cache it all and then destage at a quiet time, but if you have any sustained throughput you are going to be constantly destaging from fast to slow disk to make room for new incoming writes. You're writing everything twice and not seeing much benefit.

1000101 posted:

I'd suspect what we'll actually see in the future is something more along the lines of policy based data management built into the array. This application always lives here and that application is archival so put it on cheap lovely disks. A sort of "manual" control that can be managed transparently to end hosts.

I'd like to see storage that is smart enough to tell me when a workload has bottlenecked, or when two workloads are competing and crowding each other out, along with providing the tools to transparently re-distribute those workloads to new sets of disk or controllers or whatever. A global pool with QOS would also be neat, though it's tougher to do with something like disk than it is with say CPU or memory shares.

KS posted:

I read the negative descriptions of tiered disk in this thread and wonder, because the reality for us has been so different over the last 2 years we've had Compellent arrays. We bought in just a few weeks before Dell bought the company.

Hey, if it works for you then it works for you. I think it's tougher to engineer around than just throwing a bunch of cache at the problem, but it obviously works well for some people because it is being used happily by customers. I just don't think it will be the preferred method moving forward as flash becomes cheaper.

My experience with it has also been lackluster because my previous company bought Hitachi storage with dynamic tiering to run ZFS on top of which is just utter nonsense given the way ZFS works.

adorai posted:

zfs appliance

What are your takeover/giveback times like on the zfs appliance?

# ? Apr 11, 2013 23:28

madsushi: Apr 19, 2009; Baller.
#essereFerrari

1000101 posted:

It's not realtime but you can default writes to land in the highest tier and eventually move down as they data gets cold. Most data tends to be that way anyway and when you're talking north of 100TB of storage it becomes cost prohibitive to put everything on flash or even 15k RPM SAS.

I'm not talking about putting all of your data on SSD/SAS, I'm talking about using that as your read cache.

Storage is all about ratios.

You have 4 types of traffic: small random write, small random read, large seq write, large seq read.

You need to have answers for all 4 types of traffic: NVRAM write cache, SSD/Flash read cache, lots of disk, and lots of disk.

The NVRAM and the read cache take enough of the small/random load off of your disk and so that your disk can do what it REALLY excels at: large/sequential read/write.

The magic of SAN performance is sizing everything properly. You have many elements to consider: capacity, NVRAM, read cache, raw throughput, and "bad" applications. A "bad" application is one that either overloads your NVRAM or read cache with specific patterns and forces you to hit disk.

The ratios here depend on what your app/workload is. Let's take a Nimble CS220:

12 TB Raw
320/640/1,200 GB of SSD
2 GB of NVRAM

So you're looking at 3-10% read cache and 2 GB of NVRAM, which will handle most workloads. The vendor will tell you to up your read cache or your NVRAM (bigger head) if your particular workload falls outside of the base ratio. Min/maxing your ratios is the key to SAN performance without tiering. You need enough NVRAM and read cache to absorb most small/random workloads so that your disk can handle the large/seq workloads. Now you can buy lots of SATA disks because you've absorbed the random/small stuff for it.

A SATA disk can easily push 100MB/s of sequential read, so even your small SAN with 12 disks is capable of pushing multi-gig throughput if the disk can just sit there and stream data. Where your performance on SATA goes to poo poo is when you make your disk constantly switch operations: seq read -> random read -> seq read. Now, instead of being able to just read a whole bunch of blocks in a row, you pay your seek latency for EVERY single request. Absorb those random/small blocks via NVRAM and write cache and now your "slow" SATA disk can just spit out data closer to its max speed rather than its min speed.

If your Compellent was sized with more SAS than SATA, then you are sized properly. I wouldn't even call that storage tiering. You're offloading your backups (replays) to slow disk, which is more than acceptable. Most NetApp customers do the same: SAS on production and SATA at DR. I'm talking about SATA being used as the main storage area with SAS reserved for hot blocks, and the SAS portion being woefully undersized. Obviously sizing is a big player here, and it's not like a Compellent (or 3PAR, or IBM, or ...) full of SAS isn't going to do just fine.

SAS is generally fast enough that you can throw any workload at it and you're good to go. SATA is not. SATA will fall down if you put high I/O workloads, especially trying to combine large/seq and small/random at the same time since SATA is so bad at context switching. The "new" storage paradigm revolves around finding a way to use SATA to store the ever-growing amount of data that companies are generating while also leveraging flash/SSD/NVRAM to ensure good performance. The SSD/SATA combo is very potent when executed correctly. If your workload doesn't meet the criteria (more than 10% or so of hot data or TONS of sequential writes ala video streaming) then you're going with SAS anyway and it's moot.

# ? Apr 11, 2013 23:39

KS: Jun 10, 2003; Outrageous Lumpwad

NippleFloss posted:

You could write it inline as it's being read from the lower tier, but presumably you're also busy destaging stuff from the higher tier to make room, since space on the higher tier must be constrained, otherwise you would run everything there all the time.

At least in the Compellent world, if you fill your top tier to 100% you're either sized very wrong or have <5% free space left on the whole array. Data progression works to keep enough space available on top tier to handle writes -- the lower tier disks fill first. A healthy array has zero space allocated on the lower tier disks to writes. Not sure if this holds with other vendors.

# ? Apr 11, 2013 23:48

evil_bunnY: Apr 2, 2003

adorai posted:

I am reporting in to say that I am extremely happy with our decision to augment our netapp storage with an oracle zfs appliance.

Has Oracle added *anything* to ZFS lately?

# ? Apr 11, 2013 23:51

adorai: Nov 2, 2002; 10/27/04 Never forget; Grimey Drawer

NippleFloss posted:

What are your takeover/giveback times like on the zfs appliance?

It seems to work a lot different than it does on our netapp, but i would say under 5 seconds.

evil_bunnY posted:

Has Oracle added *anything* to ZFS lately?

I bought it for what it already has, which is a fuckton of iops and zero penalty snapshots. I don't really need any new features.

# ? Apr 12, 2013 00:19

Vulture Culture: Jul 14, 2003; I was never enjoying it. I only eat it for the nutrients.

I'm curious, do any of you other guys in the tiering discussion work in an environment with petabytes-range of raw data? SSD caching is great for applications with repeatable hotspots, but it's going to be a long time before it can play with the HPC kids.

# ? Apr 12, 2013 00:24

madsushi: Apr 19, 2009; Baller.
#essereFerrari

Misogynist posted:

I'm curious, do any of you other guys in the tiering discussion work in an environment with petabytes-range of raw data? SSD caching is great for applications with repeatable hotspots, but it's going to be a long time before it can play with the HPC kids.

My biggest client system is about 100 TB on production. That's what I was saying about ratios: at some point you can't buy 5% of your space in SSD and you're going to have to have the raw disks to back your workload. Many SANs do fall into that range though, especially since 1 TB of flash could back a 20 TB SAN pretty effectively.

# ? Apr 12, 2013 00:59

YOLOsubmarine: Oct 19, 2004; When asked which Pokemon he evolved into, Kamara pauses.

"Motherfucking, what's that big dragon shit? That orange motherfucker. Charizard."

Misogynist posted:

I'm curious, do any of you other guys in the tiering discussion work in an environment with petabytes-range of raw data? SSD caching is great for applications with repeatable hotspots, but it's going to be a long time before it can play with the HPC kids.

The customer I work with has about 3.5PB of data on disk. But very very little of that requires anything approaching high performance. There is a lot of VMWare, a lot of Exchange 2010, some SQL and Sharepoint and Oracle. The demanding stuff is mostly OLTP or virtualized server workloads that benefit significantly from caching because the working set sizes are usually manageable or latency requirements just aren't that strict. We have some GIS users that have imaging and mapping server farms that push something more akin to HPC and our only option for those guys is generally to sell them a lot of fast disk because it's too much data to fit in cache and too much data to quickly move between tiers.

If your main workload is HPC then I can see where you wouldn't see the benefit. You will want hardware tailored to those distinct requirements. But most customers are running applications that fit nicely into the ssd cache model we are discussing. Things like virtualized server OS disks, OLTP databases, email, etc.

Just to provide some perspective on how effective read caching can be on some typical workloads, here is the breakdown for where read IOs are serviced on a couple of our NetApp boxes.

This array is predominantly Exchange 2010. Cache is ram, ext_cache is FlashCache (none in here right now) and disk in this case is 7.2K sata.

wafl:wafl:read_io_type.cache:96%
wafl:wafl:read_io_type.ext_cache:0%
wafl:wafl:read_io_type.disk:3%

We barely even touch the disk to do reads because so much gets caught in RAM which acts as a very small read IO cache on this device. A small cache goes a long way on Exchange 2010 due to the way it's structured.

This is a general purpose VMWare workload

wafl:wafl:read_io_type.cache:66%
wafl:wafl:read_io_type.ext_cache:16%
wafl:wafl:read_io_type.disk:16%

We often see the cache percentage up near 80% for this, but even in this case we are going to disk very very little to service reads with RAM providing the most benefit followed by external cache picking up things that aren't still held in ram. We aren't hitting the disk very often at all, which is good, because it is slow.

We are replacing a few thousand disk reads every second out of cache:

Instance Blocks GB Usage Hit Metadata Miss Hit Evict Invalidate Insert Reads Replaced
% /s /s /s % /s /s /s /s
ec0 268435456 1024 83 18070 748 6813 72 6 1112 249 3854
---
ec0 268435456 1024 83 20820 823 11892 63 22 16 1484 5416
---
ec0 268435456 1024 83 15106 1246 11613 56 29 440 2224 4504
---
ec0 268435456 1024 83 7749 1335 8306 48 23 9 1487 3275
---

FlashCache only replaces random read IO by design, because sequential IO is fast enough from disk already, so if we assume, very generously that a 7.2K SATA disk can provide 100 IOPs our flash based cache is replacing around 40 or 50 SATA drives, or two entire shelves worth of random IO capacity. That's a pretty good value proposition, especially when you consider the savings in power, cooling, and rack space.

# ? Apr 12, 2013 01:05

Moey: Oct 22, 2010; I LIKE TO MOVE IT

So we were gifted a single HP P4300 SAN. I brought it online today just to poke around the management of it and figured I would do some updates.

So far I am one hour into the update process (after the downloads), and it is just sitting like this. Anyone know if this is normal or if something on here poo poo the bed?

# ? Apr 12, 2013 23:46

hackedaccount: Sep 28, 2009

Any need to get it done today or to power it off over the weekend? If not, let it sit and see what it looks like on Monday.

# ? Apr 13, 2013 00:38

FISHMANPET: Mar 3, 2007; Sweet 'N Sour
Can't
Melt
Steel Beams

It's waiting for the SAN to come back from a reboot I think, you should check the console on the machine. I remember it would happen a few times that I just needed to powercycle the unit again.

Didn't leave me to confident in the system as a whole.

# ? Apr 13, 2013 03:41

Bitch Stewie: Dec 17, 2011

From the bit that's next to the redacted bit, it looks like the login credentials aren't working for some reason?

Fire up the node from the console/iLO and check what's going on and that assuming it's come up post update(s) that your credentials still work.

# ? Apr 13, 2013 12:52

Saikonate: Jun 23, 2007; Naysayer; Fun Shoe

NippleFloss posted:

A global pool with QOS would also be neat, though it's tougher to do with something like disk than it is with say CPU or memory shares.

SolidFire does this. Their big sell is setting guaranteed QoS per-volume, in fact their entire architecture is built around it - you can adjust minimum, maximum, and "burstable" IOPs at the granularity of a single volume, and it all comes from one giant performance pool.

Saikonate fucked around with this message at 17:04 on Apr 13, 2013

# ? Apr 13, 2013 17:02

madsushi: Apr 19, 2009; Baller.
#essereFerrari

Saikonate posted:

SolidFire does this. Their big sell is setting guaranteed QoS per-volume, in fact their entire architecture is built around it - you can adjust minimum, maximum, and "burstable" IOPs at the granularity of a single volume, and it all comes from one giant performance pool.

I don't really think QoS even comes into play with an all-SSD environment. You can just say "gently caress it" because it's not like you're going to get any faster.

# ? Apr 13, 2013 18:38

YOLOsubmarine: Oct 19, 2004; When asked which Pokemon he evolved into, Kamara pauses.

"Motherfucking, what's that big dragon shit? That orange motherfucker. Charizard."

Saikonate posted:

SolidFire does this. Their big sell is setting guaranteed QoS per-volume, in fact their entire architecture is built around it - you can adjust minimum, maximum, and "burstable" IOPs at the granularity of a single volume, and it all comes from one giant performance pool.

I know that's their pitch, but they are not nearly big enough yet to evaluate how well it actually works. I like the idea in theory though, since it makes tiering a virtual rather than physical operation, which makes it much more palatable.

Clustered Data OnTap has some rudimentary QOS coming in 8.2, with more powerful QOS promised down the road. I'm interested to see how it turns out.

And to MadSushi, it is still technically possible to oversubscribed SSD, especially on sequential workloads, though you're right that it's much much easier to do QOS when you have so many IOPs to work with.

YOLOsubmarine fucked around with this message at 19:41 on Apr 13, 2013

# ? Apr 13, 2013 19:12

Saikonate: Jun 23, 2007; Naysayer; Fun Shoe

madsushi posted:

I don't really think QoS even comes into play with an all-SSD environment. You can just say "gently caress it" because it's not like you're going to get any faster.

Absolutely not true. The most trivial reason this is something you still need to give a poo poo about is that even if all the disks are fast, a QoS policy of "gently caress it" allows a single client to monopolize performance to the exclusion of others - one tenant performing a massive database query in an uncontrolled environment degrades performance for every other customer. At the very least, then, you want the ability to control rate maximums. SolidFire takes this much further by offering not only rate limits, but also guaranteed minimum levels of performance, and the ability to go above maximum for periods of time when the array is underutilized.

quote:

I know that's their pitch, but they are not nearly big enough yet to evaluate how well it actually works.

You mean in a "number of customers" sense (I'd buy that)?

Saikonate fucked around with this message at 19:20 on Apr 13, 2013

# ? Apr 13, 2013 19:17

Adbot: ADBOT LOVES YOU

# ? Apr 26, 2024 16:21

YOLOsubmarine: Oct 19, 2004; When asked which Pokemon he evolved into, Kamara pauses.

"Motherfucking, what's that big dragon shit? That orange motherfucker. Charizard."

Saikonate posted:

You mean in a "number of customers" sense (I'd buy that)?

Yup. They've only been around for about 2 years and they are really still a startup without a large customer base. They are also targeting cloud service providers, not end users, so it's less likely that you'll see a lot of information about the problems they may have.

And I think Madsushi's point about QOS was that the aggregated IO from a big pile of SSD is so high that it would be very hard to legitimately oversubscribed it. A DB query from your average environment isn't going to drive the millions of IOPs required to crowd out other workloads (on the disk layer, controller or transport layer is another question). Sequential IO like DB logging activity could definitely do it though, and I've seen it happen.

# ? Apr 13, 2013 19:57

The Something Awful Forums > Discussion > Serious Hardware/Software Crap > Enterprise Storage Megathread: Why is my NAS a SAN?

«‹›207 »