Enterprise Storage Megathread: Why is my NAS a SAN?

The Something Awful Forums > Discussion > Serious Hardware/Software Crap > Enterprise Storage Megathread: Why is my NAS a SAN?

Mr Shiny Pants: Nov 12, 2012

FISHMANPET posted:

We've got a faculty member looking to buy 100-200TB, which to us is "big data." They're looking at some of those ridiculous SuperMicro servers with drives on both sides of the chasis, sold by a SuperMicro reseller we've worked with in the past (aka they would sell us a complete warrantied system). We also have a Compellent SAN, but we don't think getting a pile of trays is going to be cost effective (though it might be, we're still getting quotes).

Are there any good inexpensive SANs for big data? We don't need high performance or a lot of features because this will mostly be static data, we just want the system to be manageable and expandable.

Buy a refurbished Sun Thumper (X4500)?

Stuff it with 3 - 4TB x 48 drives?

# ¿ Jul 17, 2013 15:10

Adbot: ADBOT LOVES YOU

# ¿ Apr 27, 2024 12:52

Mr Shiny Pants: Nov 12, 2012

We are looking at a NetApp Metrocluster for our VMWare cluster and will be using commvault for backups.

Any gotcha's? Split brain? Ditch comm vault and go for Veeam? We looked at 3Par also but we liked the NetApp more because of the ease of snapshotting and the like.

Any criticism is welcome, we haven't fully decided yet.

# ¿ Jan 30, 2014 19:03

Mr Shiny Pants: Nov 12, 2012

A smaller FAS in a Colo. What is wrong with Veeam? We were pretty impressed when they demoed it. The Sharepoint stuff and Exchange stuff was excellent.

Mr Shiny Pants fucked around with this message at 19:43 on Jan 30, 2014

# ¿ Jan 30, 2014 19:38

Mr Shiny Pants: Nov 12, 2012

madsushi posted:

If you're going FAS-to-FAS, I don't know why you'd mess around with any VM backup software. Use the NetApp vCenter plugin (VSC - Virtual Storage Console) to take your snapshots and then use either SnapVault or SnapMirror to send them off-site.

That is the idea. We still might need the software for some other machines not on the filers.

# ¿ Jan 30, 2014 19:46

Mr Shiny Pants: Nov 12, 2012

OldPueblo posted:

A metrocluster is basically the same as a regular cluster, just stretched across fiber switches with a few minor extra rules (like in case of the split brain thing). Are you looking at metrocluster because you need the whole two separate sites ability? Also one thing to keep in mind is that new feature support sometimes lags a little behind the regular FAS products. For example you can mix shelves in a stack now (though not recommended), but you can't yet with metrocluster.

We have two datacentres that are close by and we run fiber to them. The Metrocluster gives us the ability to have a stretched VMware cluster on top. The idea is to have it physically separated but logically one cluster.

# ¿ Jan 30, 2014 19:53

Mr Shiny Pants: Nov 12, 2012

Ceph looks rad. Too bad I don't have any hardware to test it with. A VM is not the same as two physical boxes running Ceph.

# ¿ Jan 30, 2014 20:34

Mr Shiny Pants: Nov 12, 2012

OldPueblo posted:

Below is a great resource if you haven't seen it:

TR-3548
http://www.netapp.com/us/system/pdf-reader.aspx?pdfuri=tcm:10-59919-16&m=tr-3548.pdf

Thanks, I was just wondering if it works as advertised. I've seen a lot of solutions over the years that over promise and under deliver.

Especially if it makes regular administration harder, or some other gotchas that they won't tell you about until you start using it.

Mr Shiny Pants fucked around with this message at 10:15 on Feb 1, 2014

# ¿ Feb 1, 2014 09:46

Mr Shiny Pants: Nov 12, 2012

We are not a big shop compared to American standards I guess, we are buying a whole new infrastructure that is going to last us at least 4 - 5 years.

Can you explain the CDOT compared to 7 mode some more? I don't want us to choose a solution and needing to do a forklift upgrade, needing to buy new hardware later on, or having a dead-end solution.

We also looked at 3Par and they are also pretty nice, especially the licensing. Need replication? You buy it for the array and if you want to do sync or async it is totally up to you.

Much better than the IBM per TB licensing.

# ¿ Feb 1, 2014 21:49

Mr Shiny Pants: Nov 12, 2012

parid posted:

Awesome stuff.

Thanks man, that is exactly the information I am looking for. So if I understand correctly, Metro Cluster is a 7 Mode feature but 7 Mode is not actively developed anymore. Cdot is the future and has all the SMB3 and Pnfs goodness. They can't tell when the MetroCluster featureset will be added to Cdot.

drat, we really like the Metro Cluster because it is an awesome fit for our situation. We like the idea of one logical system divided over two physical locations, what can Cdot do right now? And is it comparable?

# ¿ Feb 2, 2014 20:00

Mr Shiny Pants: Nov 12, 2012

OldPueblo posted:

If metrocluster is the solution that fits your needs, I wouldn't hesitate on it specifically just because of the 7-mode/cDot transition. I know of huge corporations that rely on metrocluster solutions that are just refreshing theirs to newer 7-mode platforms right now. It's perfect for them and they're moving forward knowing the future roadmap and knowing that the future is cDot. That's anecdotal of course and it may not serve you best in your future, but as said above you can get information on the metrocluster roadmap by contacting your sales team. Find out what your options are then apply it to your personal timeline. I'm not trying to talk you into it, just trying to put it into perspective. It's not orphaned or in limbo really, it's just whether or not the transition fits your timeline.

True, it fits perfectly and we'll probably go with such a setup and a smaller FAS in our DR site for backups.

One other thing, does NetApp sell lemons? I mean models we should avoid because they are underpowered and the like?

Thanks guys, much appreciated.

# ¿ Feb 3, 2014 07:57

Mr Shiny Pants: Nov 12, 2012

mattisacomputer posted:

Okay, I think I convinced him to kill the VMware VSAN idea. Moving forward, what would be the low cost version of something like an IBM SVC to virtualize/centralize all of this assorted FC storage?

Solaris?

# ¿ May 14, 2014 07:30

Mr Shiny Pants: Nov 12, 2012

CrazyLittle posted:

What, exactly, is the problem with commodity kit for low-performance bulk storage solutions? Is there any real advantage to buying NL-SAS when you're just going to double or triply duplicate the data across multiple disks and storage hosts?

(I mean, isn't that the whole point of projects like backblaze?)

People are afraid they are going to be the one left holding the bag when system goes tits up.

Take that as you will.

# ¿ May 19, 2014 20:22

Mr Shiny Pants: Nov 12, 2012

NippleFloss posted:

There is a large OPEX cost associated with running applications without real support on commodity hardware with high failure rates. You're paying for more hardware to add extra redundancy to make up for the lack of reliability. You're paying more people to admin the systems because you don't have vendor backed technical support to troubleshoot problems, ship you replacement parts the same day, or perform parts replacements. You're paying for extra power and cooling and datacenter space because you needed extra hardware for more redundancy, and the hardware you bought likely isn't as dense or efficient as enterprise gear.

Well to be honest, Enterprise support sometimes isn't all that great. See the tales of woe in this thread about botched firmwares and the like taking down storagesystems. When people are breathing down your neck you get to say: "We pay them this ridiculous amount of money, they are working on it, I did everything I could."

Even with enterprise gear you buy everything twice, so the extra hardware is true for both scenario's.

As for the replacement parts: The idea is that you don't need the same day replacement parts because there is no SPOF in the system that warrants it. The cheaper parts also make it possible to have a couple of systems on the shelf should you need them.

There is something to be said for both solutions.

# ¿ May 20, 2014 05:44

Mr Shiny Pants: Nov 12, 2012

cheese-cube posted:

Getting advanced replacement on consumer hardware is usually impossible so you'd have to be running with twice, maybe three times the number of hot-spares.

Which is possible if they cost 10 times less

# ¿ May 20, 2014 15:23

Mr Shiny Pants: Nov 12, 2012

NippleFloss posted:

Enterprise support is sometimes not that great *considering the amount of money paid for it*. It is always worlds better than no support, particularly as enough complaining about significant problems will often result in free hardware magically appearing at your site. Relying on your employees to support everything means that you have nowhere to escalate to and no hope for recompense if the product does not perform as you had hoped. You can build targets into a purchase contract and sue the vendor if the fail to meet them. You can't do that when your vendor is Frys.

I'm also not sure why you think you'd buy everything twice with Enterprise gear? Unless you're running a fully redundant data center model you're not going to be buying a second SAN to just sit there in case your primary SAN fails. You're paying for built in redundancy so you don't have to try and layer it over the top.

It's incredibly hard to build a system out of consumer parts that truly has no SPOF without a significant investment in money and resources. Something like GPFS will do it, but that's not really suitable for general purpose storage use or even cheap and deep backup storage. If your boss came to you and said "please develop a storage system that can provide X number of these types of IOPs, and which has an uptime of Y 9's, and costs significantly less than the enterprise vendors" could you do it? Could you actually prove that it could meet those requirements? Would you stake your job on it?

Well that's the crux of it isn't it? Would you stake your job on it?

Everywhere I've looked one storage array equals no storage array. There is usually a second one with async or synchronous replication. Same with switches: redundant switches, paths etc etc. We've had storage arrays go down during rebuilds, controller failures etc. etc.

If I could get backing from management after explaining them the scenario's of building it ourselves and I am comfortable with the tech involved, I would certainly entertain the idea.

Would I want the extra responsibility? I don't know, it's nice to just close the door behind you and not having to care about storage you've built yourself. The tech is there though.

Mr Shiny Pants fucked around with this message at 19:02 on May 20, 2014

# ¿ May 20, 2014 19:00

Mr Shiny Pants: Nov 12, 2012

NippleFloss posted:

Redundant arrays are for DR or BCP, not for small scale failures, and those arrays are housed in a separate location. You aren't buying extra hardware because the hardware is unreliable, you're buying it because no matter how reliable it is it can still catch on fire or get swept away in a flood. Nobody is running two VMAXes side by side in the same room in case one fails. You buy a VMAX BECAUSE it doesn't fail, and you buy a second one and put it someone else in case an earthquake swallows the first one.

This is in distinction to the "this commodity hardware is cheap and unreliable so we need to buy extra hardware to provide the required up-time for day to day operations."

Yes, you buy redundant ethernet switches, because redundancy is provided through things like VPCs which require two switches. If you purchase director class gear with multiple SPs and virtual segementation you can certainly get by with one, much the same as you don't need TWO blade centers to provide adequate redundancy for your VMWare environment because the redundancy is built in to the platform. Very very risk averse engineers and organizations may quibble with this, but if they are that risk averse they probably aren't in the market for really super cheap roll your own storage.

Why would your management want to back you when you don't have any obligation to continue to provide support and their only recourse if you don't live up to your end of the bargain is to fire you? If you quit and go elsewhere they have to hope that you've documented what you've done well enough that whoever they hire can come in and continue to support it. If you're Google or Microsoft or Amazon that isn't a problem, because they've got no issues hiring smart people who can figure it out, and their entire business model is built around doing everything in house, so they've got significant resources devoted to QA and documentation. But most internal IT departments aren't going to have that luxury and they're better off outsourcing that expertise.

Sure it fails, everything fails. And I don't know about you but if the SAN goes down we are looking at a day or two of downtime. So no, we have two storage arrays in a active - active configuration in different datacentres. This is for DR but also if the first one fails. Getting a tech onsite to fix our array takes a couple of hours, checking if everything works, and booting the whole infrastructure also takes a couple of hours. This is if they can find the issue right away. Our IBM SAN went down because of a second disk deciding to not fill in as a spare even though the array said it was a good drive. That was a fun night. Took us two days to get it running again, even with IBM support.

Now you can say the support was worth it, and it was but let's not pretend storage arrays don't go down. They do, and usually, spectacularly so.

As for the management backing: If you buy a storage array the cost of it usually involves some management backing otherwise you don't get the funding. So during these talks building it yourself can be discussed (depends on company culture for sure) as well as the risks involved. If both parties feel it is worth it due to costs, flexibility or whatever I don't see a reason why you won't at least look at some solutions. IMHO.

It is also about fit. I won't roll my own to host my vmWare cluster on, but something like archival storage I would certainly look at something like Ceph or ZFS.

I mean vmWare Vsan is like rolling your own and they are pushing it very hard.

# ¿ May 21, 2014 05:52

Mr Shiny Pants: Nov 12, 2012

goobernoodles posted:

Anyone know how to get IBM on the phone without waiting for a callback for SAN issue? loving waiting for a callback.

This way you have time to update to the latest firmware, like they will ask you when they call

# ¿ Jul 29, 2014 19:57

Mr Shiny Pants: Nov 12, 2012

We just did a storage refresh and bought NetApp, coming from IBM it is a breath of fresh air.

Looking at storage arrays NetApp has been pretty much ahead of all the others with WAFL and flexibility it gives you.

HP 3Par: Software was clunky when compared to Netapp and the integration with something like VMware was still lacking. Monitoring and the like was very rudimentary. The tech behind it is very cool but the whole package felt a bit lacking.

IBM: They finally made a good user interface which they ported form the XIV. Storage was alright, but for snapshot backups you need another box and for real HA you would need an SVC on top the V7000. TSM integration with the array was also a weak spot.

Did not look at EMC, the guys who did the offer pulled out.

It isn't all good though:

Snapprotect (commvault) is a beast. And you get some weird stuff like exchange needing ISCSI whereas the rest will run on NFS. Thanks MS.

Not too many VMs in a volume because of VM stun issues if you want them consistently backed up. But this is more of a VMware issue than a NetApp one.

These are little issues that crop up when doing an implementation but are still important when designing the whole solution.

# ¿ Jul 31, 2014 05:47

Mr Shiny Pants: Nov 12, 2012

NippleFloss posted:

This should not be an issue. Are you using NFS datastores? Do you have VMWare snapshots disabled?

Yes.

Well that is a bit of a problem, they are not totally consistent when they are not in VMware snapshot mode. It is like yanking the powercord of the VM.

It is not a slight at NetApp, all other backup tools have the same problem. The downside is that snapshots are made on the volume level requiring you to make more volumes than you may want.

# ¿ Jul 31, 2014 06:54

Mr Shiny Pants: Nov 12, 2012

Me too, if you could shed some light on why Snapshots can't be deduped? That would be awesome.

# ¿ Aug 12, 2014 08:07

Mr Shiny Pants: Nov 12, 2012

NippleFloss posted:

I'm not a developer, so I could be wrong about some of this, but my understanding is that it was decided that it was a cost-benefit consideration and it fell on the side of not worth doing. Snapshots are embedded pretty deeply within the WAFL code and the major principle behind them is that once we take a snapshot all of those blocks are locked and cannot be modified until the snapshot is deleted. We can do things like move the PVBNs underneath to re-arrange the data at the physical layer, but we can't change the VVBN layout at all. When a data block is deduplicated the metadata that points to that data block is updated with the new location of the reference block. Doing this on snapshots won't work because those metadata blocks are locked by WAFL.

There are some cases where you will get deduplication between the active filesystem and a block locked in a snapshot though, just not between blocks already locked in snapshot.

There is some interesting work going on along those lines though, that I hope eventually makes it in to a product. A fully reference counted version of WAFL would allow for pretty cool stuff. If you're curious about WAFL internals you can find Dave Hitz's (NetApp co-founder) original paper on WAFL from 1995 here and an updated paper describing how FlexVols are implemented here.

I was under the impression that it already had a reference count on each block, you learn something new every day. Thanks.

# ¿ Aug 13, 2014 05:43

Mr Shiny Pants: Nov 12, 2012

gallop w/a boner posted:

My organisation (small to midsize law firm) is implementing a new Practice Management application. We are purchasing new virtualization hosts and a dedicated SAN for this application. I could do with some advice on the SAN.

Space-wise we need about 20TB usable. The software vendor uses the Microsoft SQLIO utility to size storage performance, and they have advised that we need a SAN that can meet the following benchmark, as observed by SQLIO:

64k test: 2625 random writes / 5500 random reads
8k test: 4465 random writes / 8250 random reads

I don't know if the use of SQLIO is unusual or not, but that is the data I have.

We've been to our normal suppliers for recommendations. One has suggested a NetApp FAS802A. The other has suggested a 3PAR Storeserv 7200. I am waiting on the full details and pricing. From the sounds of things, the HP will give us a lot more spindles and capacity for the price.

We don't use SAN based replication or backups (we use Zerto for replication and Dell Appassure for backups), so we don't have a particular requirement for any of these features, although we would like cross-shelf HA (might be wrong terminology).

Currently all our infrastructure is 1GBE. I know that 10GBE is becoming (?) popular but I don't know whether this is worth the cost premium.

Any input is appreciated.

SQLio is a pretty good benchmark. You are one step ahead with a software supplier that delivers a baseline compared to most software companies. Consider yourself lucky

As for a SAN, your requirements are not that weird. I would take a look at the Netapp, their SQL backup software is amazing. If you are not going to use the Netapp tooling than pretty much any SAN will fit your needs. 3Par is nice what I have seen of them.

I would ask for a couple of demonstrations and see for yourself which kit you like best.

If you could swing it: For 20TB I would try an all flash array like a Pure and not have to worry about iops ever again.

Mr Shiny Pants fucked around with this message at 17:33 on Aug 20, 2014

# ¿ Aug 20, 2014 17:29

Mr Shiny Pants: Nov 12, 2012

Nitr0 posted:

and never worry about $110k out of your pocket.

I had no clue they were that expensive. IF he could swing it!

Would be well spent though

# ¿ Aug 20, 2014 21:41

Mr Shiny Pants: Nov 12, 2012

Some benchmarks would be cool, especially if you could compare Solaris ZFS based workloads to Linux ZFS based workloads.

I am wondering if the ZFS performance differences between Linux and Solaris are really large.

# ¿ Sep 30, 2014 05:17

Mr Shiny Pants: Nov 12, 2012

Netapp has a powershell provider right? Might be worth scripting.

# ¿ Dec 17, 2014 07:40

Mr Shiny Pants: Nov 12, 2012

mayodreams posted:

Thanks. We use Nexenta and I guess I took CLI access to the filer for granted.

Hah, yesterday I needed some logs from a NetApp. That was a fun half hour.

It has \\filername\c$ though, for access to the log files.

# ¿ Feb 17, 2015 06:28

Mr Shiny Pants: Nov 12, 2012

NetApp question: We have a Metro Cluster running with one node running all SATA and one node running SAS shelves. We migrated our Exchange environment (DBs) to the SATA one and since we made the Netapp the primary member in the DAG the filer seems to stall.

The filer serves RDMs to Vshpere hosts via ISCSI.

All IO seems to drop to almost zero, disconnecting users and making our life miserable. ISCSI latency rises to 60 ms instead of the normal 1 ms - 4 ms.

Anyone ever seen something like this before?

Load is a constant 200MB sec on a filer that has about 30 SATA disks in its aggregate and 512GB of flash cache.

I've checked the EMS logs but can't find anything in it that would explain the behaviour we are seeing.

# ¿ Feb 19, 2015 06:38

Mr Shiny Pants: Nov 12, 2012

OldPueblo posted:

It could really be a bunch of things. MetroCluster has a small automatic performance hit due to the distance factor, even though you'd think its only writing to the local side its really writing to both plexus simultaneously, etc. I guess I'd start with where was it working better before, on the SAS? How fast are your ISL links, do you have two or four? Do the switch ports show excessive errors via porterrshow? Could be a network bottleneck, I've seen a MetroCluster overwhelm a network after a head upgrade resulting in tons of network discard packets on an older network infrastructure. I'd say maybe fire off some ASUPs and see what support has to say, or check My ASUP and System Manager for any recommendations.

The primary environment is now running on our 8 year old DS4800 and the Netapp is functioning as a DAG member receiving the log changes. Now it works fine, it is when we switch to the NetApp as our primary that the problem seems to occur.

The switchports are showing no errors, everything is connected through Nexus switches it has 10 gbit for uplinks. No idea what an ISL is or why I need two or four, explain that please. Inter shelf link? We have optical SAS cabling between the cluster nodes.

We already have a support case, I was just wondering if anyone of you might have seen this before as there are a few people visiting this thread that have Netapp experience.

Mr Shiny Pants fucked around with this message at 09:49 on Feb 19, 2015

# ¿ Feb 19, 2015 09:46

Mr Shiny Pants: Nov 12, 2012

parid posted:

What does sysstat look like? Is it keeping up with consistency points? CPU usage? What does your raid domain cpu usage (sysstat -M) look like while its in the problem state? How old is the install? What version of ontap?

Do you have FMC-DC running and log collecting? If not, you may want to start in case you support case goes that way and you need days of samples.

NippleFloss posted:

Grab a "sysstat -x 1" output during the issue and check for a B or b in the "CP Type" column and high disk utilization. What ONTAP version are you running? Also, are you seeing high read latency or write latency, and are you seeing it on the log volumes, the DB volumes, or both?

I am not in the office today, NetApp also asked for a systat output. So we'll plan a day where we can coordinate this with our users. Will let you know. Thanks for the input guys.

# ¿ Feb 20, 2015 09:02

Mr Shiny Pants: Nov 12, 2012

PCjr sidecar posted:

FC is great; if you've got the budget for diamonds why go with a polished turd.

If you have diamonds why not go whole hog and get Infiniband?

It is probably the best interconnect, too bad it gets glossed over.

# ¿ Mar 3, 2015 21:39

Mr Shiny Pants: Nov 12, 2012

Gwaihir posted:

Filing this one under "Huh, I didn't really expect that to work at all."

These SAS enclosures (IBM 5886es that were providing DAS for our AS400) actually work just fine hanging off an LSI SAS3008 based controller card with Dell's I/T firmware on it in one of my old R710s. I can't even use the original IBM controllers since they're all PCI-X instead of PCI-e. Which is a shame since they were quad port models with 2 gigs of cache and the nice battery setup where you could swap it from the outside without even popping the cover on the machine.

Now what the gently caress can I use these things for? Home grown openfiler or freenas install as a disk based backup pool?

The power bill alone would be astronomical I think. Usually it's cheaper to get new drives.

# ¿ Jul 1, 2015 17:14

Mr Shiny Pants: Nov 12, 2012

Internet Explorer posted:

What storage are people getting excited about these days? I'm still happily chugging along with my EqualLogics, but my needs are fairly pedestrian. I like simple storage. Fancy storage has only been a pain for me.

I was just wondering this, I'd rather have Toyota that trucks along than a Viper that breaks down every two miles.....

Could just be me though.

# ¿ May 3, 2016 06:46

Mr Shiny Pants: Nov 12, 2012

adorai posted:

Does veeam give me anything that a second SAN to replicate to doesn't?

A really nice way to get your data out? Restore it? Per object AD restore, record sql, item exchange without needing agents?

Automatically testing your replicated VMs?

It is very good.

# ¿ Jun 8, 2016 05:46

Mr Shiny Pants: Nov 12, 2012

Walked posted:

Anyone worked with FusionIO cards?

I have a 1.2tb one that's not behaving as anticipated. It's for a lab and out of warranty but very very low use.

Is it a regular PCIe card? Home much do you want for it?

# ¿ Feb 8, 2017 18:27

Mr Shiny Pants: Nov 12, 2012

Walked posted:

It is! And it's going to find a home in my desktop if I cant get what I want out of it for a storage server.

Nice.

# ¿ Feb 8, 2017 22:59

Mr Shiny Pants: Nov 12, 2012

evil_bunnY posted:

loving robocopy is steadfastly refusing to copy security info and it's making my job way harder than it has to :<

You could try running it as localsystem.

# ¿ May 28, 2017 22:23

Adbot: ADBOT LOVES YOU

# ¿ Apr 27, 2024 12:52

Mr Shiny Pants: Nov 12, 2012

Harry Lime posted:

I remember the first time I did one of these for a customer, it loving ruled. Blew my mind that we could get the controller swaps done in under an hour with no downtime.

What's weird is that this seem to be so unique an experience, I mean no wonder companies are dropping their SANs in droves for the cloud if you need a PHD to update it. It should always have been this simple and there is no reason it could not be this simple.

Mr Shiny Pants fucked around with this message at 18:35 on Sep 8, 2019

# ¿ Sep 8, 2019 18:08

The Something Awful Forums > Discussion > Serious Hardware/Software Crap > Enterprise Storage Megathread: Why is my NAS a SAN?