This is my first attempt at a megathread so try not to hate me for it. It was requested by a few people and I decided to take some initiative for once in my life.
What is a SAN?
A SAN is really any network you use to access storage. That storage could be CIFS shares or it could be fibre channel array.
Some people interchange this term with storage array (including some manufacturers and sales types) which confuses things further.
When I think SAN; I consider the switch fabric that connects everything together; the HBAs on the host side; and of course the targets in whatever form they may be.
I do a lot of work specifically with NetApp right now but I've also dealt with compellent, Dot Hill, HDS, and to a lesser degree (becoming much more familiar now). I'm going to talk a bit about NetApp and how many of these concepts relate to NetApp, mostly because my current project is looking at ~40TB of NetApp and ~200+ ESX servers and so my brain is stuck on it.
There are a number of tiers of storage with accepted vendors that I'll outline here. This isn't the definitive list by any means and I'm sure people will come in and say "I'm using vendor X for Y " so don't hang me for it.
I use NetApp mostly because its one of the most flexible platforms you can buy. It can support fibre channel, iSCSI, NFS, and CIFS with a lot of great software options like snapshots, thin provisioning, de-duplication, and more.
NetApp also uses a very high performance RAID they call 'RAID DP' which is a dual parity algorithm that can sustain 2 drive failures before it dies. NetApp will call a RAID DP array an aggregate. On top of that aggregate sits WAFL, NetApps most awesome filesystem which is comparable to ZFS. When you create a fibre channel LUN, it's actually a file that sits on that filesystem and it's going to be spread out across every disk in an aggregate.
NetApp is typically favored by a lot of Oracle shops for its superior NFS implementation and it also makes an excellent platform for VMware (it also happens to be supported by Site Recovery Manager.)
Pillar Data was born of Larry Ellison saying "I'm tired of NetApp owning all of our customer's NFS business" and dumping 100 million dollars into a new hobby: enterprise storage. The device is supremely over-engineered and eventually I suspect it will cost a lot more than anyone else but it's pretty damned cool.
It supports something they call 'Data QoS' where data with high performance needs is marked and placed on the outside of the spindle where its spinning the fastest.
In addition to the main storage processors; every shelf has its own RAID controller.
One of my customers is testing this platform and loves its faster than netapp head failover but apparently miss a few netapp features. I didn't pry because I was there to learn them about VMware.
A pretty hot company in the mid-range market; they sell a lot of features that many would consider magic. Chiefly the automatically tiered data, block level RAID, and the usual suspects: snapshots and clever usage of snapshots (writable snapshots!)
This storage is great because you can buy a shelf of 15k RPM disks and a few shelves of SATA and probably get what can be perceived as tier1 spindle performance. It tracks block access and migrates them around spindles as needed, effectively and automatically archiving older data to SATA disks without additional software. I believe many people here have positive compellent experiences.
My only negative with these guys is that the UI gets kind of pokey on a loaded system and performance with OnStor is pretty abysmal; so we ended up doing a rip-n-replace at a client site with NetApp 3070s.
What features should I look for? What is important?
Since you're consolidating your storage; availability should become the first order of business. You'll typically want 2 storage controllers so when one dies you don't have to stop business.
Performance is typically the next thing to worry about. A number of things will impact this:
3. RAID level
Drives are a multi-part equation that relate directly to RAID level and are ultimately the most important thing to worry about. They are commonly referred to as 'spindles' in the storage world and come in many flavors:
SSD (new with EMC)
And many speeds, commonly: 7200 RPM, 10k RPM, and 15k RPM. A single drive is capable of so many IO's every second or IOPs. There's a formula to calculate this out but a decent rule of thumb is that a single 15k RPM disk can provide 120-150 IOPs. That said; if your application needs 1500 IOPs then you're going to need at least 10 drives to achieve that performance. There's more to it than that; especially when you factor in RAID and cache but thats sort of the basics.
SSD disks are relatively new and I believe EMC is currently shipping a storage product that makes use of it. Blazingly fast but pretty expensive; it solves many of the latency issues
Cache is another important piece of the puzzle. Some storage controllers/processors will have upwards of 4GB of cache; which will basically give you a 4GB window before you need to worry about disk performance impacting what you're doing. Some arrays support cache partitioning (HDS) where you can give specific applications a dedicated amount of cache and others just use a whole lot of it.
Most storage arrays support RAID 0, RAID 1, RAID 5, RAID 10 (or 0+1) and some form of RAID 6. Each one has performance and availability limitations that should be considered as you're looking at your rollout.
The network is just as it sounds. This could be your traditional IP network or it could be a dedicated fibre channel network. Either way; its the last piece of the puzzle and if you've only got 600mbps of bandwidth available then that is going to be the best performance you can hope for.
So how do I decide what is best?
It's 100% absolutely dependent on the application you're running. On a typical EMC array I've seen several RAID 10 RAID groups configured alongside a number of RAID 5 RAID groups depending on wether the disks were handed off to a file server or a database server or a web server.
In short; there is no one real answer to this question unless you have a truly unlimited budget. Even then I'd say there is no real answer to this question.
I heard you need fibre channel to have a real SAN? (or I heard fibre channel was best)
PROVE IT! In most environments I've worked on in the last year; iSCSI and NFS were plenty fast for the storage needs. I would say in most cases it's probably best for your business with a few exceptions. If you're in that exception then this thread is probably pretty boring to you in the first place because you already know this poo poo.
If anyone is really interested; I'd be happy to outline an environment in which fibre channel is clearly and absolutely superior to IP storage solutions even factoring in 10gigE.
What is fibre channel anyway?
It's certainly not restricted to fibre optic cables and I can't seem to get people to wrap their mind around that. In fact; brocade's 256gbps FC uplinks on the DCX series are copper.
FC is a protocol that is akin to ethernet in its functionality. It's a very low latency protocol that provides a lot of speed. In an FC SAN, a host hands a SCSI command to an HBA which will then pack that SCSI command into an FCP frame which is then sprayed down the wire to a target; which effectively unpacks that FCP frame and executes the SCSI command.
It differs from ethernet in that it will automatically aggregate multiple links without requiring a separate load balancing protocol or having to worry about anything arriving out of order. That means if I've got two switches linked together with 4 8gbps uplinks it's going to 'spray frames' down each link providing ~32gbps of bandwidth.
Another key difference (except on cisco MDS switches) is that every switchport is capable of 100% utilization at the same time.
Some switch vendors would be Brocade, Cisco, and QLogic. I'll post more on these shortly.
Sounds expensive; how bout iSCSI?
iSCSI is a great because its a block level protocol that is carried over traditional IP networks and can typically be managed well enough with IP administrators. In most cases; performance is comparable to fibre channel though there are the occasional instances where it's not going to be suitable. Those will be rare though. To put it in perspective I have two clients off the top of my head running 1500+ user exchange databases on iSCSI in excess of ~800GB one of which is actually also running a 6 host VMware cluster on the same storage array (EMC celerra if you're curious).
It's anecdotal but I'd challenge anyone to take an FC device and an iSCSI device that are comparable and try to find a performance difference.
Moving on to NAS or in this case; NFS
Many of us had come to hate NFS as most of us knew it well on linux or solaris or in some cases IRIX :angry:
An interesting thing about NFS is that it performs as well as iSCSI in many cases and in fact has the blessing of Oracle to share out very large databases to multiple servers.
NetApp NFS is particularly good (though it's a pay for license unlike iSCSI) and some of the NFS optimizations can sometimes make it outperform iSCSI. Since it's not a block level access protocol; it's ideal for clustered filesystems like vmfs (see VMware) since you won't need to make SCSI reservations any time filesystem metadata needs to change.
Up next; what are these bad rear end features that make these expensive storage arrays worth so god damned much anyway?
Talking to rage-saq a thought occurred to me so I re-arranged things a bit.
1000101 fucked around with this message at 06:00 on Aug 29, 2008
|# ¿ Aug 29, 2008 05:38|
|# ¿ Jul 3, 2022 06:17|
So many of these storage arrays come with things like FlexClone or Data Instant Replay or de-duplication; but what does all this poo poo do and how does it work?
We'll talk about the most common feature available in most storage arrays from just about every vendor first. Mostly because this is the foundation for most other features anyway.
We've all heard the term and it almost sounds like magic! How the gently caress can I instantly copy 20TB of data by snapshotting it?
Well you technically aren't actually copying anything. There's another pretty standard feature called replication that we'll get to in a bit.
Snapshotting works like this:
You take it and the storage immediately stops writing to that section of the disk; it immediately marks it as read-only and continues writing elsewhere.
What does this mean? If I have 50GB of data and I snapshot it and write another 5GB of data then I have 55GB of data. Now if I delete all of that data and write 45GB of data, I'll actually see 45GB of usage in my operating system but on the storage array I'm going to see 100GB.
Why? because the old data I snapshotted is still physically on the disk; it's just a different part of the disk. I can restore that snapshot and be right back at that point of time with a few mouseclicks in many cases.
What if you just change a file? Lets presume you have a document on your snapshotted volume and you decide to change the font from 'cominc sans' to 'arial'. What will happen is the storage will write your change over to a new section of the disk but reference the rest of the data from the snapshot until it changes. If you restore your snapshot and look at your file, you will notice that 'arial' is back to 'comic sans'
This of course is why in the previous example that we eat up 100GB of disk space.
Think of it as spawning a new timeline of data:
You might be asking yourself; won't this eat up a lot of disk space? Yes and no. As your data volume grows, yes you will certainly eat up more disk space. The thing you have to consider though is change rate. In most environments you have a very low change rate; probably less than 20% which is why 20% is considered the acceptable "overhead" for snapshots when figuring up storage capacity. i.e. if I need 1TB of usable storage and I want to use snapshots then I should really get 1.2TB of usable storage.
If you think about it; it starts to line up. If you're working on a page in Indesign; it could easily be a 40MB document for example. However you're changing positioning of various elements or fonts or colors. Otherwise minor changes in the grand scheme of things and in the end you might have less than 5MB of actual change on the blocks; so with snapshots your 40MB file eats up 45MB of actual disk space.
So whats the drawback? Disks are cheap why the hell not give everyone snapshots?
At the outset, most snapshots are crash consistent. What does that mean? It's in as usable a state as data on a disk where I just yanked the power. Transactions could be half committed or data may not be current because it was still buffered or whatever. This is especially critical on database type applications like Exchange or SQL.
To combat this; a number of things have been done over the course of the years. NetApp for example has released 'SnapManager for Exchange/SQL/VMware' which will actually quiesce your database (essentially letting it know its going to be snapshotted and prepare for it) and make sure the snapshot is statefull and therefor usable when you need to back up to it.
So the next whiz bang thing we'll talk about is replication. Replication will typically take your snapshot and send them off to some other storage array which is probably the same make/model.
Obviously enough, the first sync will take forever since you've got to get all the data over there in the first place. In those cases we tend to replicate the data locally then ship the backup storage off site and send over the deltas.
What's a delta?
Essentially the data from the start of the snapshot until now or the next snapshot. To sum it up further; all my changes.
Once that's done you should be able to figure out whats going on from there. Depending on change rate; you can have remote replication on as slow as a T1 link without negatively impacting your business. Of course; this depends on change rate and frequency of replication. Obviously if you're changing 100MB an hour then you're going to need enough bandwidth to move 100MB an hour (in truth a little faster).
Past replication; snapshotting leads into other technologies like cloning. Cloning is not technically replication on most storage arrays and in many cases can be referred to as a 'writable snapshot.' Essentially you take a snapshot as before but you can then clone that snapshot which will effectively create another "timeline"
That clone can then be handed off to another server to do whatever with. A common application would be to clone a production database to hand to dev/QA teams to test an application against real data.
It'd look something like this:
Even if a developer goes apeshit and 'rm -rf's' his database it's not going to hurt the base data; just that developer's "timeline" so to speak.
Interestingly enough; a few vendors have taken this line of thinking and applied it to boot from SAN, VDI, or other nifty things. Imagine handing your 100 linux boxes with red hat ES 5 the same 30GB boot LUN off your storage and the only consumed space is essentially the configuration information for each server. Something like this:
Now we all know that between a number of otherwise identical linux servers; that the hostname and IP settings occupy less than 1MB of storage. Imagine if you only needed to worry about storing the common data once and the changes between the boxes is the only extra overhead? Storage savings ahoy!
The last thing I'll talk about is de-duplication. This is relatively new and some implementations actually tie directly to snapshots (hello flexclones and server instant replay!) and in a way is basically cloning. I'll talk about NetApp's because it's free and one of my customer's has a huge hard on for it.
NetApp calls it A-SIS which I believe is 'advanced single instance storage' or some poo poo. What this will do is scan all the blocks on a given volume searching for identical blocks. When it finds them; it replaces all instances of that block except one with a pointer to a real block. This is typically a scheduled task and is pretty CPU intensive. A simple way to look at it is to basically say it is a block level "zip" that uncompresses on the fly as needed. There are performance implications to consider though since now that one block is probably being accessed about a zillion times more frequently.
edit: my little graphs are hosed up; if anyone wants I'll try to work something up in visio.
1000101 fucked around with this message at 05:41 on Aug 29, 2008
|# ¿ Aug 29, 2008 05:38|
So this poo poo all sounds fancy and expensive; how much does it cost man!?!?!
Not as much as it used to!
There are a number of reasons behind this and to sound like a sales snake I'm going to use the term "market drivers"
1. data storage needs are growing for small to mid sized businesses
3. technology is getting less expensive
4. A lot of manufacturers want your money
What cost 250 grand a few years ago is easily 50 thousand dollars now.
Depending on options you can consolidate all of your storage for <30k fairly easily. I think you can get 4TB of StorVault (low end netapp) for ~12-15k if not less.
The benefits of consolidation are pretty clear:
For one; you make your servers a disposable commodity. IBM has a great deal? Use that; get it with two small disks to hold the OS.
Not sure how much storage you need to grow? buy what you need and as you fill it up buy more disks!
The answer to many of your problems becomes 'buy more disks' but now these disks are available for more than just the box you're putting them in.
|# ¿ Aug 29, 2008 05:55|
Pillar is what you want if you run Oracle
Interestingly enough; Pillar was actually in the high end area but I bumped it down to "upper-mid" to account for the fact that its relatively new and hasn't been put through the same paces as an HDS or EMC Symetrix yet.
When I'm not at work I'll post more.
|# ¿ Aug 29, 2008 17:18|
Catch 22 posted:
I collected some IOPS data for my environment, and we are looking at a average of 300, top end 600 IOPS, excluding when my backup is running. I am looking at a Equallogic SATA SAN 8TB now (3000 IOPS), and its 50K+ or 60K+ for SAS.
What software options are you looking at? Why are you buying 8TB of storage? Do you need 8TB? For 60k you can buy a lot of decent storage.
|# ¿ Sep 18, 2008 22:52|
As an alternative;
you get iSCSI and CIFS for free though.
Pay for options you might find useful:
Snap Manager for MS Exchange
Snap Manager for VMware
Snap Manager for MS SQL
EMC and NetApp both make fine products and it shouldn't be too difficult to get an eval.
Both should come in <60,000 USD. NetApp may be cheaper since RAID DP will give you sufficient performance without having to buy double your required capacity in disks. It also virtualizes storage on the backend making management a little less of a hassle for you.
|# ¿ Sep 19, 2008 22:34|
Catch 22 posted:
Your not kidding, I just got a quote on a EMC, for a AX4-5i DP with 1.7TB of SAS 15K drives and a second DAE with 12TB of SATA 7.2K including my SnapView, Navisphere, and SAN Copy for 30K.
That EMC box is a fine piece of equipment. I have a customer who just installed one and is pretty happy with it.
Your concerns about NetApp are certainly valid.
|# ¿ Sep 25, 2008 16:22|
I wanted to comment on your post Vanilla, mostly because it's just tossing out the same argument that EMC sales reps throw to my customers that largely gets ignored.
What EMC doesn't do is have the blanket virtualisation of a single array. If we look at the EVA it all sounds really great but there are some downsides as I spend my time working with the users of the EVA often, old and new EVA's. Many admins have come from a background of control and having this taken away by an array that wants to decide where to put things is uncomfortable. When you have lots on a box this is not good, especially without any form of dynamic tuning other than making new groups.
This assumes everyone wants a dedicated storage administrator managing their data.
I'm finding that even in large enterprises, most application owners and storage admins don't really give a poo poo where the data goes as long as:
1. performance is consistent
2. Availability is consistent
Even still, it's trivially easy on most every system that does block level virtualization to force data on specific disks to provide that isolation.
This feature becomes EXTREMELY important in the SMB space. That company with <1500 employees doesn't want to hire a dedicated storage administrator to keep track of what LUN is on what RAID group and what spares are assigned to what RAID group. It's a lot of extra overhead and quite frankly an unnecessary salary for a majority of the cases out there.
|# ¿ Oct 15, 2008 18:01|
Erg, I guess I had a mistype, it's not a 270, it's a Net Appliance NetApp F720 and it uses Eurologic NetApp XL501R Fibre Channel JBOD FC9 (I know our company has a bunch of FC7's laying around)
Scales up to 1.2TB!
When you used the word "monstrous" to describe it; it made me suspicious of having a FAS270...
The 270 is 3u and the heads are integrated into the back of the first shelf.
here's some info on the F720.
|# ¿ Nov 11, 2008 17:48|
If you can find an FC or SCSI drive that fits the chassis, you can slot it in and it will work. NetApp will just cancel any support on the box.
If its just a temp storage spot then its not a huge deal.
|# ¿ Nov 12, 2008 20:26|
Ok, any decent array can do this and will have the ability to take crash consistent copies of things like Oracle and Exchange.
Just wanted to point this out to keep terminology sane.
Crash consistent means the data is consistent with a crash and therefor it may not necessarily be what you want (i.e. could be worthless).
You really want to take consistent snapshots and replicate those.
|# ¿ Dec 15, 2008 17:24|
An application owner should say "I want application consistent data" and not "I want crash consistent data."
In the NetApp world this is managed by way of the Snap Manager products to ensure data integrity at the time of the snap. It does the verification you speak of. NetApp replication is very much reliable, as are the snapshots it takes.
If EMC took crash consistent snapshots (it won't with the right licenses I presume) then none of my enterprise customers would keep their Symmetrix systems. They would replace them with something that took consistent snaps.
Lucky for EMC, RecoverPoint provides this functionality and does so quite well.
To qualify, though my last 6 months have been knee deep in NetApp shops, I have worked with EMC technology before and I've only encouraged one customer to swap to NetApp. This was because he absolutely hated his Celerra and EMC support was costing him an arm and a leg for a value he just didn't feel he was getting.
I just wanted to get straight the term "crash consistent" as we usually relate that to a bad thing. I guess if you're doing file servers then crash-consistent is okay, but hardly ideal.
Doesn't snapmanager for Exchange do this? I haven't used it, but I was under the impression it could handle this.
It does exactly this.
|# ¿ Dec 15, 2008 19:19|
The apps owners also say 'I want dedicated spindles, zero data loss and 2TB by this afternoon'
The app owners will be pretty livid in many cases when you give them a crash consistent state when you restore a snap. That was my only jib. He's going to expect it to be consistent should he need to restore. You can't depend 100% on transaction logs or an 'fsck' to bring things back to sanity.
The reason for my comment above is that i've never seen anyone run eseutil on an exchange snap. This is because Eseutil places a massive amount of I/O on the Exchange DB and if you're doing this on a snap that is pointing back at production you're just passing that I/O on. Even worse is if you have many snaps all trying to complete eseutil and all hammering production!
Well different environments do separate things. You can't always assume that every customer will follow your best practices. That said, NetApp is pretty flexible with how it uses spindles. I can spread a LUN over say 40 spindles making the hit from eseutil pretty much nil. I've seen this proven in MANY large environments (over iSCSI even).
I just want to clarify terminology. If EMC came in to a customer and said "we'll take crash consistent snapshots" then we'd steer that customer away since they'd be getting about the same value out of a 3Ware RAID box assembled from Supermicro.
Since they don't say such crazy things; they're normally in our top 3 vendor pick. Unfortunately for EMC, they're often passed over due to lack of decent manageability. This is also the number one reason I'm seeing them ripped and replaced. Nobody cares that LUN X is only on spindles Y-Z these days, especially if performance is comparable.
Separate volumes/RAID groups is an out-dated concept that needs to find it's way out the door in about 99% of the use cases. I realize this is the EMC party line, but they are partying their way out of the door of any organization with <5000 employees. Anyone buying into EMC now ends up regretting it as they grow and replace it with a compellent or a filer or something anyway.
|# ¿ Dec 15, 2008 20:57|
I agree, but to some extent but the opposite is true. If you are completely reliant on an array to place everything for you you can't really do anything when performance starts to suck apart from buy more disk. How do you guarantee IOPS? I'm not just talking about end users i'm talking about the Cap Gemini's and EDS's of the world who have to guarantee backend performance.
The awesome thing is; I can actually do that with one of these crazy virtualized storage backends as well. Regardless of whether I'm doing it EMC style or like the rest of the modern world, when performance sucks the solution is always to buy more disks. With everyone else's products though, I may not have to quite as soon.
It is trivially easy for me to create a dedicated volume on a 40+ drive aggregate servicing a single LUN on NetApp.
It's trivially easy for me to do this with 3Par chunklets and its trivially easy to do it with Compellent StorageCenter.
I'm sure most other vendors have this same level of functionality.
The truth of the matter is that most people don't have IO needs so specific as to require separate RAID groups. There is a pretty big market out there for <100 spindles. If they need less than 10,000 aggregate IOs/sec then why do they need to worry about carving out dedicated RAID groups.
It's a lot easier for an IT guy to look at total aggregate IO/bandwidth requirements and make a decision that way; rather than figuring out on a per raid group basis; then figuring out that he dedicated too many spindles to a RAID group and have to migrate LUNs around.
Also keep in mind that I'm thinking about a market that spans from 50 employees on up to and including a good chunk of the 5,000 employee orgs. Anything larger than that, and people are more than happy to staff a bunch of excel experts to manage their storage. These guys are buying DMX for performance and lots of it.
I find overall capacity to be a useless metric in determining that sort of thing. Given that one of my customers is <1000 employees and maintains about 50PB of NetApp (and about 12 of EMC). They like the flexibility and ease that NetApp provides and only bought the EMC for a VMware project that ultimately ended up being housed on NetApp NFS.
On a separate subject have any of the boxes you've worked on had flash drives? If so - opinions?
None yet, most of my customers who would need that level of IO already have 1000+ spindles that they bought prior to the SSD shelves offered by EMC. It will be a while before they validate the shelves and put them in production.
I'd love to see this sort of thing more often though. It would be a hell of a thing to leverage with automated tiered storage.
edit: Don't think I hate all things EMC, I'm a services guy and the product is great. I'm mostly passing on complaints from paying customers and their feeling on the matter. I have a hard time disagreeing in many cases.
|# ¿ Dec 15, 2008 21:53|
I can't see the report but I'd be curious to see the methodology.
Mostly because it doesn't line up to what I'm seeing in the real world. Did they completely omit thin provisioning from the NetApp side or the fact that you don't have to carve out raid groups and waste NEARLY as many disks as you do with EMC?
I'm guessing no.
People buy NetApp to save money and EMC to maximize performance. At least among my customer base (banks/FSI, media shops, ASPs).
I would also argue its flawed in that no one with a 10TB storage need is going to buy a CX3-80/6080. That said, the comparison was made, I just need to understand the methodology that reached that result.
I'd also like to point out that the 6070 doesn't exist anymore.
There are other details as well that we should cover to understand why this isn't an apples to apples comparison.
First and foremost; the 6080 is about twice the system as a CX3-80 (making the near twice the cost per gig slightly less surprising). Side by side, the NetApp supports almost twice the disk capacity, has nearly 4 times the RAM, and a lot more expandability. We're talking 480 spindles backed by 16GB of RAM (two SPs) vs 1100+ spindles backed by 64GB of RAM (two heads).
If you want a real apples to apples comparison,
use the NetApp FAS3070 or its replacement the 3170 (or 3140 is closer still)
We're talking a HUGE difference in price here.
This also ignores the fact that NetApp gives you iSCSI, NFS, CIFS, and FCP all in one box.
1000101 fucked around with this message at 02:21 on Dec 17, 2008
|# ¿ Dec 16, 2008 23:35|
Oh wait you're right. Replaced by the 6040 and 6080.
Doesn't invalidate the rest of my post though.
|# ¿ Dec 17, 2008 02:20|
What would it look like if you were to compare across multiple product lines?
Say a CX3-10 vs a FAS2050? Maybe some others? Be curious to see the numbers.
Doing some digging on google and asking around, it looks like Gartner doesn't factor in the entire TCO of a given solution (just the initial solution cost) and the 10TB figure is a raw number, not actual usable storage. How well that gets utilized (and directly affects overall cost per gig) depends on a number of factors from the type of data being stored to how knowledgeable the guy managing the storage is (or how much time he wants to spend managing said storage).
Granted it's a great metric to start, it doesn't paint the whole picture.
Its good to talk about and I figure at worst you're going to have some more ammo to fire at your customers looking to competing solutions.
|# ¿ Dec 17, 2008 11:00|
Ok, I am a bit confused here. Unless you do a dedicated aggregate, you are sharing disk I/O across multiple raid groups on NetApp. At least as far as I know. Is there a way to enforce a volume/LUN to sit on a particular raid group, separate from everything in an aggregate? If so, that would be awesome.
You have it, as far as I know you'd basically create say a 20 drive aggregate, throw one volume on it with one LUN and be done with it.
This doesn't make sense to do because you're burning a whole lot of space; but this is effectively what you're doing with EMC anyway, so I guess it all lines up.
I'm not sure that you can map a LUN to specific disks any other way; but it is trivially easy to create an aggregate and just put one LUN on it.
|# ¿ Dec 19, 2008 19:45|
Any huge red flags about a NetApp filer? HP guys in town are trying to tell me NetApp is the devil.
Repeating everything Oblomov stated, but also wanted to say that you may want to avoid the 2020. Its growth options are a little limited so a better entry point might be the 2050.
How much capacity do you need? What applications and data are going to be using the storage? Do you have other needs (like remote replication for example)?
I can't sell you a box but I figure we can arm you with the right questions to ask your resellers.
|# ¿ Dec 21, 2008 03:40|
You won't; but at the same time if your head goes you lose access to all of your data.
Why didn't your VAR pick up on this? Are they an authorized NetApp reseller? I might question them in regards to why they didn't go through PCMall's parts list with a fine toothed comb.
That said, why the 2020 instead of the 2050?
Anyway, a second head will cost you ~7500ish if I recall. Chat with your VAR and maybe they can cut you a "pity" discount but don't count on it.
Worse come to worst, contact NetApp directly and complain. Explain to them that you explicitly laid out your requirements and were sold something different.
In the future, don't buy storage from the internet equivalent of Best Buy. Even if you don't hire a consultant, find a VAR thats authorized by netapp and if need be, take the quote back to netapp to make sure you're getting what you expected.
|# ¿ Jan 29, 2009 18:28|
I don't think 2020 can do clustering, you have to pony up to 2050 for that. At least that's what I recall when we last got a few of each for remote offices. 2050 is basically almost 2x the size and has space for 2 internal controllers and 20 drives.
It supports clustering. We had to upgrade one of our client's 2020's to support it but it does work. The only fault with the 2020 is that its basically a dead end platform. The 2050 is generally a much better fit for people and has a lot more expandability.
|# ¿ Jan 31, 2009 10:33|
There's a second slot in the back for a second head in a 2020 as well.
|# ¿ Feb 1, 2009 21:13|
I just saw an article about Pillar Data laying off 30% of their workforce.. and here I am with 100k to spend on a SAN and can't even get them to return my phone call. Anyone using Lefthand VSA in production? It sounds very cool and scary at the same time.
Oblomov has posted some pretty positive stuff about Lefthand in this thread.
That said, make sure you look at a couple of manufacturers to make sure you get exactly what you want. Just because you have 100 large to spend, it doesn't necessarily mean you should spend all of it.
|# ¿ Feb 4, 2009 17:15|
You can in fact use a NetApp as a gateway in front of whoever.
|# ¿ Feb 15, 2009 21:45|
How well do these V-filers work? Haven't tried them yet and we were thinking of trying to front some EMC and Hitachi storage with it.
I have a customer thats front-ending HDS USPs with it and he is pretty happy about it. That was actually my first and only experience with it. A series of AMS frontended with HDS USP which again has the NetApp in front of it. They're using iSCSI for their ESX project.
Two of my colleagues at work seem to think pretty highly of it though.
|# ¿ Feb 16, 2009 17:57|
I think unless you put a Cellera in front of your EMC, EMC won't support you period.
Another option is OnStor though.
|# ¿ Feb 18, 2009 00:57|
What kind of performance hit will deduplication incur?
It can perform very well. I have a customer with de-duped webservers (several hundred of them per filer) and he's seeing a huge savings on disk space. You can reduce the negative
I can see both sides of it: if I am just reading the same block all the time (say, a shared object in Linux or a DLL in Windows), then if that block is deduped then I'll be winning. But lets say I modify that block, then the storage array will sort of have to pull that block out and now start keeping a second copy of it, and managing that slows the array down.
Its not so much the managing of that second block as it is too many servers actually trying to access that block and if you don't have it cached.
|# ¿ Mar 9, 2009 19:50|
Bumping this in the hopes that some EMC guys can tell me about RecoveryPoint and how it can give me application consistent replication/handles quiescing of snapshots.
Are you still watching?
|# ¿ Mar 18, 2009 05:24|
Catch 22 posted:
You would get an app consistent snap first (using SnapView to manage and set this up) then RecoverPoint replicates at the blocklevel (clones) the LUN. Flatfiles would not need the snap first.
You happen to have any links to EMC papers that talk about SnapView integration and how it works to quiesce data stored on a VMFS volume?
Google has me coming up clean and all the VMware integration papers have pretty much this to say:
"SnapView makes clones!"
But it doesn't tell me how it handles application quiescing.
Something else has pointed me to yet another product (Replication Manager) but I want to make sure there isn't something else I need.
|# ¿ Mar 18, 2009 20:44|
So what Recoverpoint does is, as far as i'm aware, is use Replication Manager as the tool that makes the snaps and clones. Replication Manager has been around for years and is a standalone product but is integrated / bundles when Recoverpoint is sold.
So what its looking like is that if I want to take consistent snapshots of VMFS volumes, I need to buy Replication Manager correct?
I presume it works by telling vCenter to take vmware snapshots (kicking off VSS in the guests) and when that completes it fires off an array based snapshot? (This is how SnapManager on NetApp works).
The EMC engineer assigned to this account is pretty worthless when it comes to volunteering information and english isn't his first language which only makes things worse.
edit: Many thanks for the blog entry, I see there is a cellera VSA there which will make my life about 1000 times easier.
|# ¿ Mar 18, 2009 22:15|
It's a great blog for anything EMC and VMware. I'm really getting into the VDI stuff personally.
I'm currently at one of the largest retailers in the world doing a VMware design that specifically will be leveraging VDI and EMC storage. I want this to be as big a win as humanly possible for VMware and EMC since it will give us no end of clout with both organizations.
It's also going to look great for me professionally.
As such, I'm trying to dig up every word of information I can cram into my skull regarding EMC DR/BC, SRM integration, VMware integration, etc.
My biggest complaint is that the guy who wrote the VMware whitepaper (h1416-emc-clariion-intgtn-vmware-wp) is either brain dead or has no idea how VMware works or how people are implementing VMware in their environments. I can't give this paper to a customer in the state it's currently in.
Now that I know roughly what I'm looking for, I'm finding a decent amount of information though. I'll probably have Replication Manager running in my lab next week.
|# ¿ Mar 18, 2009 23:52|
Does BMC have an SMTP connector? All autosupport does is use standard SMTP to deliver mails to wherever.
For NetApp, I'm really worried about the management aspect. I've read enough things about FilerView and it's limitations. I'm certainly not scared to use a CLI, but it seems like there is a large learning curve involved which would be potential barrier given my availability. I also don't like having to tack on licenses for features I might need down the road. That being said, once the initial investment is completed, it's much more affordable to add more storage.
Honestly, as ugly as filerview is, once you've poked around in it for about a week you can pretty much figure out everything you could possibly want.
I've helped a lot of smallish businesses make these sorts of decisions and so far, of all the guys that picked netapp, none were disastified.
Most love the management because it becomes a single point for them. The NetApp serves as their CIFS server, iSCSI, NFS, and/or FCP box all in one. All of their data is considered an equal citizen in the netapp world and they've got one tool to manage their storage.
I bolted a particularly important sentence as well as this is VERY important when figuring out who you're giving these tens of thousands of dollars to.
That said, lets get some specifics about your environment.
What sort of applications are you running? One of NetApp's great benefits is the integration they provide with things like VMware, Exchange, SQL, etc. You called out ESXi and Exchange, which may mean you could be looking at tools like SnapManager for VI and SnapManager for Exchange. These could be great tools to help keep things sane.
I have a customer for example who has one guy managing AD, Exchange, and storage for a 1200ish person organization. He uses SnapManager with Single Mailbox recovery and provides all his users something like up to the hour restores on any individual mailbox in the company. It takes him less than 20 minutes to do said restore.
What's your data looking like? Would things like de-duplication on live volumes be of a benefit to you (this is a free feature from NetApp)?
Do you have any disaster recovery needs you want to consider? Lookng to tighten up backup/recovery SLAs? Remote replication?
Given that you're a one man shop, you probably want to spend some money on tools to make your life easier. Find out from each vendor what tools they have available to do just that and get those tools.
|# ¿ Apr 18, 2009 01:52|
It's one of those things I'm going to have to try out. I'm defintiely not getting anything without being able to do some sort of demo on the system. I've been able to do a bit of playing around with a LeftHand setup by using their demo VSAs and I like most of the management tools there (although some of the emailing and alerting setup is a bit cumbersome to configure.)
Netapp is more than happy to ship you a unit to kick the tires on for a month or two. Contact your VAR and get the hookup.
The ability to have all my data management be done in one location really is appealing. We're definitely going to go with iSCSI for our block level stuff since I see no point in investing in FC, nor will management sign off on the cost even if I did.
FC is generally overrated in a lot of use cases anyway. iSCSI is a perfectly viable option and of course, it's free.
You're going to want it anyway for Exchange integration with SnapManager (if you go that route).
My VAR was talking about putting my ESX datastores in NFS instead of iSCSI. I believe he mentioned something about it being easier to resize NFS volumes. We didn't go into a lot of details about it as this was just a get to know you" kind of deal.
NFS is fantastic because yes, volume resizing, thin provisioning, and de-duplication all work right out of the box without any tomfoolery. It's also safe to create just one big NFS volume to house your VMs.
The drawback is that the NFS license may be more costly than you'd like. If you're okay with iSCSI and intelligently lay out your LUNs you should be mostly fine though for VMware.
As I stated in my post, aside from the standard Windows AD services, we have a Perforce server which is used by about half the current staff. We are switching to using it for the whole team as people switch onto newer projects. Neither of these servers are virtualized, which is something I'd seriously like to correct. I had started Perforce as a virtualized solution, but ran into what I found out later to be non-virtualization issues (bad hard drive in the ESXi host).
Sounds like a low to midrange filer may be appropriate. I would recommend if your VAR goes for the 2000 series, that you look into the 2050. Its more expandable than the 2020 and won't be a 100% forklift replacement when the time comes to outgrow the 2050.
We do not have Exchange at this point but it's one of the things that's been discussed extensively and is on my roadmap for this year. Additional roadmap items will involve centralized build management, better bug tracking and development management software and probably some other stuff that I'll find out about 5 minutes before it needs to be implemented.
So when you put exchange in, your management is pretty much going to turn it into one of the most important tools you have to keep track of. Look at the effort to keep this thing backed up and running between Lefthand and NetApp. I can pretty much tell you without a doubt that NetApp is going to win that fight. If the saved headaches are worth it you can get budget for it.
One of the NetApp advantages here would be that I don't have to pay for management tools I'm not using at the time. I get the basic software and only buy what I need when I need it.
Yes, also NetApp has a leg up on other vendors in that nothing is really hidden. You tell the sales guy what you want and he'll tell you what products netapp sells with the license costs. The systems are fairly straightforward.
I have one question about CIFS on NetApp: Am I able to make the filer look like more than one server to the users? Management would very much like to have a "server" for each project instead of \\server\project1, \\server\project2, etc.
You could assign multiple IP aliases to an interface I believe, and access by DNS names to each alias I believe. You could also use vfiler which lets you "virtualize" filer insances on the filer.
My data is a goddamned mess and that's the major thing that's driving all this. It's scattered across servers with no easy way to manage and maintain it. De-duplication and cloning would probably benefit me a great deal as a lot of my Linux servers share common base files, and I'm positive that there's all sorts of other things that could be de-duped on my Windows shares.
Then netapp dedup may be for you. The best part is it is free; I believe you just have to request the license. Later versions of OnTap may include it now.
I have plenty of disaster recovery needs I need to work on. A lot of my backup work has to be initiated manually and on a busy day some of it can get missed. I'd need to create my SLAs first before I could tighten them. I'll be honest, it's a real mess right now. It ties into the main point that's driving me to get a SAN: I need to get the data into a managable state before I can start ensuring that we get it all backed up reliably.
Work out your DR needs now, because it will help you figure out what budget you need for your storage; with whatever solution you intend to buy.
Remote replication is one of the things I've considered and have begun to nudge management towards for backups. It's one of those things that will be looked at once I have the data centralized. I need to know what my approximate delta is. If it's small enough we could get by with tape, but one of our problems is that our rate of data generation is VERY spikey. As projects near completion, a lot more changes are made. I'll have to plan around it. It's also going to get much bigger than our current delta now that I'm finally (after a lot of years and triyng to get management to sign off) getting people to do all their work iteration saving using their CVS instead of whatever stupid system they normally use because that's how they do it at home or were taught at school.
Remote replication can get expensive, not just on the storage side but the infrastructure side. You need network bandwidth to get poo poo over to the DR site.
This is a really critical point for us. Another thing I've managed to impress on management is that we need to start getting our setup more standardized because I am not an invincible "computar nerd machine guy" who is available 24x7 like they think I am. I go on vacations, and I could get hit by a bus or some poo poo. They need the ability for someone to be able to step in and fix something without reading through 100 pages of my documentation and then searching Google for hours. Having something critical be down for more than a couple of hours could be a real problem depending on how close to a deadline it happens.
NetApp support is pretty darn good about keeping your system alive when all you've got to help you is real estate agents and sales people.
That brings up a really good point: whatever solution I get is going to have to be pretty damned fault tolerant. Management needs a guarantee that a LOT could go wrong and we'll still be ok. LeftHand scores a lot of points there simply by virtue of every set of disks being a full controller system. NetApp would require a second head and SnapMirror to do that.
Well, NetApp fault tolerance does need a second head; but snapmirror is only required if you're replicating data to a whole other filer system. If you just want storage controller failover, you just need the second head and cluster them.
You'd use snapmirror to move the data offsite for example; or if you've got super high SLA requirements, you could use it to replicate to another filer in the same location but a different set of disks/heads.
Holy poo poo that's a lot of Sorry for the long posts, but this is by and far the best resource I've found for this that's not full of tards hurfdurfing all over the place and making GBS threads out marketing points. I can take it to PMs/email if it's too much.
I think its good for the thread since others have similar questions or are wrestling with similar issues.
|# ¿ Apr 20, 2009 20:05|
Yeah, unless you're managing the symettrix from that box there's no reason to present a gatekeeper to it. If anything he's exposing your array to a variety of bad things if my understanding of the symettrix platform is correct.
That said I've only worked with one of them for about 2 months. I'd kill for a symapi/symettrix simulator that was worth a poo poo.
|# ¿ Jul 25, 2009 01:43|
Well you can manage the symettrix through the gatekeeper LUN, so the harm would potentially be that someone could use your server to make changes, etc.
|# ¿ Jul 25, 2009 02:43|
Tragic in that datadomain was a neat product. Here's hoping EMC doesn't just shelve it into obscurity. Little surprise they paid so much for it.
|# ¿ Jul 25, 2009 21:14|
It's EMC that has the track record of good integration with its acquisitions. Netapp are the ones who drive almost everything into a wall!
So long as it helps Clariion and Symettrix lines its going to work out. However, given that EMC seems to really like the idea of people buying shitloads of disk shelves, I doubt we'll see any inline data deduplication for production storage.
Hopefully EMC proves me wrong and makes the datadomain more than just a VTL offering bundled with Lagato.
|# ¿ Jul 28, 2009 00:43|
While yes you do; depending on your application it will end up paying for itself in disks over the course of 2-3 years. I'm stalking strictly from a production storage standpoint. Of course, on the NetApp side of the house, de-duplication ends up being a zero cost option but I'd like to see other vendors with similar technology. Particularly for my very interested fibre channel customers who understand that EMC does FCP better than NetApp by a factor of like a billion or something and would rather jump in a bathtub filled with scorpions than buy a filer.
|# ¿ Jul 28, 2009 23:08|
So with Celerra, the EMC NAS, there is already file based DeDuplication (at no extra charge). A combination of single instancing and compression.
I'd argue that depending on application, you might see zero performance impact from de-duplication. I've got a particular customer in mind who's virtualized several thousand webservers and keeps a couple hundred on a volume with ASIS. His overall storage footprint is <100GB for every 200 or so webservers. With a couple PAM modules he's actually performing better than if he wasn't using de-duplication.
Also keep in mind that while netapp is block level; it works very well on file level data. If you've got 400 copies of pain.jpg on your filer, you're only going to consume what one file does. The neat thing is if thats a frequently accessed block, then good odds its going to be sitting in cache on the box.
This is the sort of stuff thats going to start driving more units in people's datacenters. A lot of people are looking to cut costs wherever they can and if they can pay 50k in software to avoid adding a couple 30k shelves which consume space and power, then they're going to do it. This is the driving force behind every VMware engagement I've been on since the middle of last year. Spend more on software to avoid hardware costs and save datacenter space/power.
I want to see EMC break ground in this area with the CX4 line particularly; as I want an alternative in the event that NFS/iSCSI isn't going to be sufficient or they don't want to wait the duration it can sometimes take for a filer head to realize his partner poo poo himself. Since I'm not a reseller, I don't care what storage someone buys as long as they're happy with it and they aren't going broke trying to maintain it.
Keep in mind, I'm speaking in the context of online storage here, not nearline or backup devices. I think part of NetApp's plan was to try to leverage DD's de-dupe stuff on the fly and try to make it work with production storage.
However, if you deal primarily with DMX/V-MAX systems then your particular customers might not care about saving storage capacity whenever possible. My customers range from guys who think an AX4 is hot poo poo to someone who's got ~100 or so DMX4 systems so I have a much broader interest.
|# ¿ Jul 29, 2009 06:52|
|# ¿ Jul 3, 2022 06:17|
I've found most people too scared to turn on DeDuplication in online production on anything other than unimportant volumes through simple fear of potentially affecting performance. That and the fact it has to be rehydrated before being backed up and backup durations are already growing too fast without another bottleneck.
I think "most people" should really be qualified. As I've pointed out, I can think of a number of pretty sizable customers that are actively using de-duplication on revenue generating production systems. Depending on the application it can pretty much "solve" growth issues.
De-deduplication with virtualization is pretty much a home run in a LOT of cases. I would bet money that you're using products from companies that are doing just that right now.
Even still, people will find value in only buying say 1 or 2 disk shelves a year instead of 5 or 6 if de-duplication will just slow the growth.
|# ¿ Jul 30, 2009 04:07|