Enterprise Storage Megathread: Why is my NAS a SAN?

The Something Awful Forums > Discussion > Serious Hardware/Software Crap > Enterprise Storage Megathread: Why is my NAS a SAN?

«‹›207 »

EnergizerFellow: Oct 11, 2005; More drunk than a barrel of monkeys

oblomov posted:

For memory throughput especially, the 75xx series is really really good.

I was going to say just that. You're looking at a lore more money (and probably IBM Power) if you want a DB box that's faster than the new Intel Xeon 7500-series (aka Beckton or Nehalem-EX). Much, much more memory bandwidth than a desktop i7, if that's seriously what you're thinking of benchmarking.

EnergizerFellow fucked around with this message at 06:38 on May 23, 2010

# ? May 23, 2010 06:34

Adbot: ADBOT LOVES YOU

# ? Apr 27, 2024 04:12

Klenath: Aug 12, 2005; Shigata ga nai.

Who here has direct experience with the Dell EqualLogic product line?

My group is starting to get more file server hosting requests from around the campus on which I work. They always want tons of storage, and my CIO always wants to give it to them for free. The rest of us who live in the real world know storage ain't free, but it doesn't have to cost fiber channel level of dollars either - especially if it's just a user file server scenario.

iSCSI arrays are looking more and more attractive for these kinds of things, especially since many of the creatures have volume snapshot & array-level replication to help with backups.

I went to an EqualLogic semi-high-level presentation and was very interested in its capabilities. I know I can get Dell to come in and salivate all over the place, but I want to know the real deal from people who have actually worked with the things. You know, those kinds of questions which can only really be answered by real-world use and not by a glossy color brochure or a sales troll.

Questions like:

What is the overall performance like, in your experience?

How well does an EqualLogic "group" scale, for those beyond one shelf?

How good is the snapshot capability (speed / IOPS hit on the original LUN / snap & present snap LUN to same or other host automatically or near-automatically)?

How easy / reliable is the array replication?

Does EqualLogic require retarded iSCSI drivers for multipathing like Dell's MD3000i (we have one of these things already, and I'm not very fond of it - partially for this reason)?

Can I realistically use hardware iSCSI offload with EqualLogic (many modern Dell 1U & 2U servers have optional iSCSI offload for onboard NICs, and it would be nice if we could leverage it)?

# ? May 24, 2010 16:56

Vulture Culture: Jul 14, 2003; I was never enjoying it. I only eat it for the nutrients.

I'm running Nagios, and NSClient++ on our Windows servers. Most are QLogic HBAs with SANsurfer, some are Emulex with HBAnyware. How do I monitor Windows MPIO to make sure each server sees the proper number of paths to its configured storage? (It should be 4 paths per LU: 1 active and 1 passive on each of the 2 connected fabrics.)

Bonus points if you can answer this for ESX, Linux, AIX or Solaris.

# ? May 24, 2010 22:35

Intraveinous: Oct 2, 2001; Legion of Rainy-Day Buddhists

1000101 posted:

Clearly this is the case but I'm not sure how Oracle handles it. Is Oracle going to acknowledge every write? If so then yeah 70ms would make the DBA want to blow its brains out. Or does it just ship logs on some kind of interval? More of an Oracle question than a storage question but it would be helpful in deciding if you should use array based replication or application based.

Typically when we're doing offsite replication and the RTT is >10ms we tend to use async replication but it's often crash consistent. Exceptions are when you use tools like SnapManager to do snapmirror updates as part of your DR process. It's a larger RPO but you're going to be application consistent on the other side.

Knowing little about Oracle what I might do is something like this on an hourly schedule:

1. Place oracle in hot backup mode
2. Dump an oracle DB backup (if feasible, may not be depending on DB size)
3. Snapshot array
4. Take oracle out of hot backup mode
5. Replicate recent snapshot offsite.

Step 2 may be completely redundant though. This is not unlike how something like NetApp Snapmirror works (kick off exchange VSS writers, kick off array snapshot, turn off VSS writers and tell the array to update snapmirror which sends the recent snap offsite.)

Bandwidth requirement is basically whatever it takes to replicate the difference between each snapshot. So if you're ready heavy you could probably use less than 128kb or if you're write heavy it could get pretty insane. It is definitely something to keep an eye on.

Apologies in advance, as I'm way behind in this thread. If this has already been answered months ago, just feel free to ignore me.

The way we replicate our Oracle Databases is using Oracle Dataguard. Basically, you have a temporary LUN set up on the DR/Receiving side that receives a log ship on a set interval. Your shipping interval can be as short or long as you like, and since we've got 100Mbit connection to DR, we go on a constant (real time) shipping schedule, and if something makes it get behind, such is life. Our SLA is for DR to be within 5 minutes of Production. Main DB writes out logs as changes are made and ships them out to the DR site, where they are applied. Whole thing is handled nicely in Oracle Grid Control.

IANADBA, so I'm not sure if this requires RAC in order to work this way, or not, but that is what we are running currently. 4x node Integrity for our production systems, and 2x node for DR, running Oracle Database Enterprise and RAC 10g.

# ? May 25, 2010 02:52

oblomov: Jun 20, 2002; Meh... #overrated

Klenath posted:

Who here has direct experience with the Dell EqualLogic product line?

I have 40+ of the 6000 series, a few 6010s, a few 6500s and bunch of 5000 series arrays all over the place. I've been running Equallogic for about year and a half now.

quote:

Questions like:

What is the overall performance like, in your experience?

How well does an EqualLogic "group" scale, for those beyond one shelf?

How good is the snapshot capability (speed / IOPS hit on the original LUN / snap & present snap LUN to same or other host automatically or near-automatically)?

How easy / reliable is the array replication?

Does EqualLogic require retarded iSCSI drivers for multipathing like Dell's MD3000i (we have one of these things already, and I'm not very fond of it - partially for this reason)?

Can I realistically use hardware iSCSI offload with EqualLogic (many modern Dell 1U & 2U servers have optional iSCSI offload for onboard NICs, and it would be nice if we could leverage it)?

1. Performance is pretty good. You have to properly size your network throughput, your applications, VMware (or HyperV, or whatever) virtual environments, etc... Keep in mind you either need shitload of 1GB ports or couple 10GB per box, and make sure to have MPIO (active/active) on your clients (be careful with virtualization config). Performance is directly related to scaling (which depends on your network setup, think Nexus/6509/6513 if Cisco, maybe 4000 series but haven't tried that) so if you want more IO, prepare to throw down more boxes (and of course up to you if you want to RAID-10 or whatever). I believe with latest firmware you can have up to 12 boxes (or was it 8) in same pool. Group is just for management, really.

2. Snap hit is pretty light. The snapshot itself is very fast and I haven't really observed performance degradation during the process. That said, I don't take snapshots during really busy times.

3. Replication is very simple. However, don't look for compression or anything fancy like that. Look for Riverbed or similar devices to optimize WAN (if doing WAN replication). Setup is straightforward, haven't had it fail yet. I replicate fairly large volumes daily to tertiary storage. I don't replicate much across the WAN.

4. Yes, you want to use the "retard" drivers

. However, they work very very well for Windows hosts. For Linux, you can do same (Windows too, but its easier to use Equallogic host utils) but have to setup each LUN manually for multipathing. Unfortunately it's same thing for Vsphere 4.0. Multipathing works well, but you have to do manual setup on the LUNs. Dell is coming out with a driver for 4.1 and beta is supposed to kick off next quarter.

5. Umm... not sure? This is on the server side, so depends on your hardware and OS. iSCSI is iSCSI after all. I don't bother with offload since CPU hit is miniscule, VMware does not do much offloading, and I don't have 10GB models running in non-VM environments. I also hate Dell on-board NICs with a passion, since they are all Broadcom (still vividly remember Server 2003 SP2 fiasco there) and well, let's face it, Broadcom sucks. I normally use either Intel NICs (in 1/2U boxes) or integrated switches in blade enclosures.

# ? May 25, 2010 03:44

oblomov: Jun 20, 2002; Meh... #overrated

Intraveinous posted:

Apologies in advance, as I'm way behind in this thread. If this has already been answered months ago, just feel free to ignore me.

The way we replicate our Oracle Databases is using Oracle Dataguard. Basically, you have a temporary LUN set up on the DR/Receiving side that receives a log ship on a set interval. Your shipping interval can be as short or long as you like, and since we've got 100Mbit connection to DR, we go on a constant (real time) shipping schedule, and if something makes it get behind, such is life. Our SLA is for DR to be within 5 minutes of Production. Main DB writes out logs as changes are made and ships them out to the DR site, where they are applied. Whole thing is handled nicely in Oracle Grid Control.

IANADBA, so I'm not sure if this requires RAC in order to work this way, or not, but that is what we are running currently. 4x node Integrity for our production systems, and 2x node for DR, running Oracle Database Enterprise and RAC 10g.

Just wanted to pipe up and say that we do same thing for our Oracle setup on NetApp. We don't use SnapMirror and just use DataGuard on our production RAC cluster. That said, Netapp demoed SnapMirror and it looked like you could do async with that as well with the Oracle Snapshot Manager.

# ? May 25, 2010 03:48

three: Aug 9, 2007; i fantasize about ndamukong suh licking my doodoo hole

We had one of our Equallogic units' networking just randomly die Friday. Couldn't ping in or out. Controller didn't fail over... nothing, just dead. We had to console it and gracefully restart and the networking came back online. I called support yesterday and eventually got to the Level 2 guys and they determined it was a known "flapping" bug, and they want us to install a non-released firmware version...

They were not able to determine what causes it, or any workarounds... just install this un-released firmware upgrade. The guy's emails were plagued with typos, and it just seems ridiculous they'd have a known bug that is so serious. The 6000 series is not that new.

Meh, support is usually good and the devices aren't terrible. Kind of a disappointing experience here, though.

# ? May 25, 2010 13:56

EoRaptor: Sep 13, 2003; by Fluffdaddy

Klenath posted:

What is the overall performance like, in your experience?

How well does an EqualLogic "group" scale, for those beyond one shelf?

How good is the snapshot capability (speed / IOPS hit on the original LUN / snap & present snap LUN to same or other host automatically or near-automatically)?

How easy / reliable is the array replication?

Does EqualLogic require retarded iSCSI drivers for multipathing like Dell's MD3000i (we have one of these things already, and I'm not very fond of it - partially for this reason)?

Can I realistically use hardware iSCSI offload with EqualLogic (many modern Dell 1U & 2U servers have optional iSCSI offload for onboard NICs, and it would be nice if we could leverage it)?

Most have these have already been answered, but I'll add some things Dell/Equallogic isn't very public about :

Equallogic 'groups' are actually grids, not clouds. This might seem like a small difference, but if a member of an equallogic group goes offline, all storage that crosses that member will also go offline. By default, that's all storage in the group, so a failure of a single device can take down much more than you'd think. Dell puts forth that units are five 9's devices, but human error could still have large effect.

Snapshots are very quick and very low impact, but they also have huge overhead. The smallest unit of space used at the device level for tracking snapshots is 16 megs, so even tiny changes on disk can make for very large snapshots. Array replication uses 512kb chunks, so the unit tracks at a finer level, but does not use them for snaps.

MPIO requires the driver for peak performance, especially when a group has more than one member. Since the contents of a disk can be spread over several units, if the first unit to respond doesn't have the requested data, a redirect will be issued, causing a latency hit until the correct device responds. With the DSM driver, it keeps a map of where all the disk clusters are, so the driver will hit the right device every time.

Hardware iSCSI offload is agnostic to whatever the endpoints are, so feel free to use it. I don't think it's worth the 1% performance improvement to risk potentially buggy NIC drivers, though.

# ? May 25, 2010 17:26

rage-saq: Mar 21, 2001; Thats so ninja...

EoRaptor posted:

Most have these have already been answered, but I'll add some things Dell/Equallogic isn't very public about :

Equallogic 'groups' are actually grids, not clouds. This might seem like a small difference, but if a member of an equallogic group goes offline, all storage that crosses that member will also go offline. By default, that's all storage in the group, so a failure of a single device can take down much more than you'd think. Dell puts forth that units are five 9's devices, but human error could still have large effect.

Seriously? So as you build your EQL "grid" up you are introducing more single points of failure? Holy cow...

# ? May 25, 2010 17:33

Intraveinous: Oct 2, 2001; Legion of Rainy-Day Buddhists

OK, I'm finally caught back up. This is such a great thread in general, so thanks to everyone contributing so far.

My question is if anyone has any experience with the FusionIO line of enterprise PCI-Express SSDs, AKA HP StorageWorks I/O Accelerator Mezzanine cards for the C-Series blades. I believe IBM OEMs their standard form factor PCIe cards as well, but I don't know what they call them.

Basically, I have a fairly small (~30GB) index db sitting in front of a much larger Oracle RAC db. This index handles the majority of queries from a website that gets about 15 million hits a month, and only when a user drills down into a query does it fetch from the main RAC db.

The index is running right now on a fairly ancient (We're about to have a tenth birthday party for it) IBM RS/6000 box and a SSA attached 6 x 9GB disk (4+1, 1 hot spare) RAID 5 array that was set up long before I was around. It sits at 100-105% utilization in topas 24x7, pulling between 700 and 1000 IOPS of 99% random small reads.

AFAIK, nothing says I can't replace this box with non-IBM Power hardware, so I'm thinking about dumping it on a BL460/465c blade (CPU licensing costs will likely skew things in Intel's favor since I should still be able to get dual core 550x cpu) with one of the 80GB SSDs. FusionIO and HP have been claiming north of 100K IOPS, and 600-800MB/sec read rates from this kit.

I'm sure once I eliminate the disk I/O bottleneck, I'll find another, but this seems like the perfect use for the part. Considering that I was looking at 5-10x more money, wasted space (both disk and rack unit), plus a bunch of extra power to short stroke an array to get even 3-5K IOPS, I'm having a hard time finding a fault, even if I only get 25% of advertised performance.

My one big worry would be fault tolerance. The data is pretty static, generated at timed intervals by a script from the larger database, so I'm not worried about the data loss as much as the downtime if it fails. A half-height blade would (in theory) let me put two of them in (if I didn't need any other expansion at all) and do a software mirror, but am I being stupid? I'm not going to be able to hot-swap a mezzanine card no matter what I do.

I'd have another blade at our DR site that could be failed over to in that case, but if I can avoid that problem as much as possible, that would be ideal.

So anyway, please tell me I've found exactly the right thing for this job, or that I'm an idiot. Although please, if it's the latter, tell me why and suggest something else to look into.

Intraveinous fucked around with this message at 18:31 on May 25, 2010

# ? May 25, 2010 17:50

H110Hawk: Dec 28, 2006

Intraveinous posted:

AFAIK, nothing says I can't replace this box with non-IBM Power hardware, so I'm thinking about dumping it on a BL460/465c blade (CPU licensing costs will likely skew things in Intel's favor since I should still be able to get dual core 550x cpu) with one of the 80GB SSDs. FusionIO and HP have been claiming north of 100K IOPS, and 600-800MB/sec read rates from this kit.

Assuming you have some kind of HA way to failover intra-datacenter and inter-datacenter you could do just what you are suggesting. I'm adverse to blades, but whatever makes you happy. I would grab 4 of the cheapest 1u Dell/HP/IBM/Whatever you can find with 5520 cpus in them, fill them with memory and a boot disk. When it gets there, velcro in an intel SSD to one of the empty bays. It doesn't need cooling, you could even leave the plastic spacer in there as a caddy.

Use two in your live datacenter and two in your DR. Have your generate-index script write to the live server's hot spare, fail over, write to the new hot spare, then write to the hot-spare-datacenter servers serially. Remember to detect write failures so you know when your SSD dies and call it a day.

I would suggest a full commodity hardware solution but I guess that wouldn't go over well. Instead of an off the shelf intel ssd you could use one of those PCI-E SSD's you were looking at as well.

# ? May 25, 2010 22:39

oblomov: Jun 20, 2002; Meh... #overrated

rage-saq posted:

Seriously? So as you build your EQL "grid" up you are introducing more single points of failure? Holy cow...

Well, keep in mind that each box has 2 fully operational controllers. Also, usually you don't stripe volumes across every single box (depending on the number of units, so if it's more then 4-5). Still, it is a point of failure that's not very clearly explained. Also, most frustrating thing ever sometimes is that there is no manual way to failover between controllers, which is kind of nuts. Still, warts and all, I like Equallogic for its price/performance/ease of use. Plus, all licenses are free, Exchange, SQL, replication, upcoming VMware and Oracle.

# ? May 26, 2010 02:15

EoRaptor: Sep 13, 2003; by Fluffdaddy

oblomov posted:

rage-saq posted:
Seriously? So as you build your EQL "grid" up you are introducing more single points of failure? Holy cow...

Well, keep in mind that each box has 2 fully operational controllers. Also, usually you don't stripe volumes across every single box (depending on the number of units, so if it's more then 4-5). Still, it is a point of failure that's not very clearly explained. Also, most frustrating thing ever sometimes is that there is no manual way to failover between controllers, which is kind of nuts. Still, warts and all, I like Equallogic for its price/performance/ease of use. Plus, all licenses are free, Exchange, SQL, replication, upcoming VMware and Oracle.

Yes, it's the biggest gotcha equallogic has. I suspect we will see a network raid feature appear sooner or later, but it may have certain requirements (ie: must use the equallogic driver, which handles data duplication) that aren't always possible*.

You can manually control how many units a disk will span, and you could do a 1 minute replication partner if you are worried about device issues. My biggest beef was that, no matter how much you build into a single unit, someone could trip over a power cord or pull the wrong cable or a ups could blow, etc (or someone could steal a unit out of a less than secure cage).

I'm only running a single device, and I'll probably never add a second, so I'm good with this drawback, but other people could be in for a surprise.

* Lefthand pays a huge write penalty for network raid, because the device must echo and write, wait for a confirmation, then commit it to disk and return that the write succeeded to the source device. You can fiddle with this a bit, but the risk of disk inconsistencies occurring is hugely scary. There isn't a perfect solution.

# ? May 26, 2010 03:46

Intraveinous: Oct 2, 2001; Legion of Rainy-Day Buddhists

H110Hawk posted:

Assuming you have some kind of HA way to failover intra-datacenter and inter-datacenter you could do just what you are suggesting. I'm adverse to blades, but whatever makes you happy. I would grab 4 of the cheapest 1u Dell/HP/IBM/Whatever you can find with 5520 cpus in them, fill them with memory and a boot disk. When it gets there, velcro in an intel SSD to one of the empty bays. It doesn't need cooling, you could even leave the plastic spacer in there as a caddy.

Use two in your live datacenter and two in your DR. Have your generate-index script write to the live server's hot spare, fail over, write to the new hot spare, then write to the hot-spare-datacenter servers serially. Remember to detect write failures so you know when your SSD dies and call it a day.

I would suggest a full commodity hardware solution but I guess that wouldn't go over well. Instead of an off the shelf intel ssd you could use one of those PCI-E SSD's you were looking at as well.

If I needed to fail-over, it would be a manual process of changing the db-links on the Websphere servers that host the site to point at the hot-spare server at the backup datacenter. Total downtime would be in the neighborhood of 3-4 minutes, which is kind of ugly, but considering there is currently ZERO redundancy for the 10 year old IBM box, it's a step up. Funny how things have a way of getting too big for their britches a lot faster than people expect them to, but then still have problems getting funding to "do things right(tm)".

As for Blades vs anything else, I share your apprehension, but we have standardized on HP kit, and HP only offers the PCIe SSD as a mezzanine card. Price point on the 80GB is around $4400 list, so figure ~$3500 after discount. The 60GB hot plug SATA "Midline" SSD is around $1500 list. Either would be such an improvement, people would probably soil themselves, but I tend to think the PCIe attached would give me more leeway in case I don't get budget to replace this thing for another 8-10 years (yikes).

I'd love to roll my own, and even considered the possibility of loading the DB into RAM, but decided the cost difference wouldn't be worth it. Sadly, roll your own solutions tend to mean whoever rolled it is stuck supporting it forever by themselves, at least around here.

Thanks for the info.

# ? May 26, 2010 15:22

Nomex: Jul 17, 2002; Flame retarded.

Intraveinous posted:

OK, I'm finally caught back up. This is such a great thread in general, so thanks to everyone contributing so far.

My question is if anyone has any experience with the FusionIO line of enterprise PCI-Express SSDs, AKA HP StorageWorks I/O Accelerator Mezzanine cards for the C-Series blades. I believe IBM OEMs their standard form factor PCIe cards as well, but I don't know what they call them.

Basically, I have a fairly small (~30GB) index db sitting in front of a much larger Oracle RAC db. This index handles the majority of queries from a website that gets about 15 million hits a month, and only when a user drills down into a query does it fetch from the main RAC db.

The index is running right now on a fairly ancient (We're about to have a tenth birthday party for it) IBM RS/6000 box and a SSA attached 6 x 9GB disk (4+1, 1 hot spare) RAID 5 array that was set up long before I was around. It sits at 100-105% utilization in topas 24x7, pulling between 700 and 1000 IOPS of 99% random small reads.

AFAIK, nothing says I can't replace this box with non-IBM Power hardware, so I'm thinking about dumping it on a BL460/465c blade (CPU licensing costs will likely skew things in Intel's favor since I should still be able to get dual core 550x cpu) with one of the 80GB SSDs. FusionIO and HP have been claiming north of 100K IOPS, and 600-800MB/sec read rates from this kit.

I'm sure once I eliminate the disk I/O bottleneck, I'll find another, but this seems like the perfect use for the part. Considering that I was looking at 5-10x more money, wasted space (both disk and rack unit), plus a bunch of extra power to short stroke an array to get even 3-5K IOPS, I'm having a hard time finding a fault, even if I only get 25% of advertised performance.

My one big worry would be fault tolerance. The data is pretty static, generated at timed intervals by a script from the larger database, so I'm not worried about the data loss as much as the downtime if it fails. A half-height blade would (in theory) let me put two of them in (if I didn't need any other expansion at all) and do a software mirror, but am I being stupid? I'm not going to be able to hot-swap a mezzanine card no matter what I do.

I'd have another blade at our DR site that could be failed over to in that case, but if I can avoid that problem as much as possible, that would be ideal.

So anyway, please tell me I've found exactly the right thing for this job, or that I'm an idiot. Although please, if it's the latter, tell me why and suggest something else to look into.

If you're worried about fault tolerance, you might want to go with an sb40c storage blade and 6 of the MDL SSDs in RAID 10. That would give you about 60k random read IOPS and ~15k writes.

# ? May 26, 2010 17:27

Nukelear v.2: Jun 25, 2004; My optional title text

Intraveinous posted:

Basically, I have a fairly small (~30GB) index db sitting

After reading your scenario, I'm not sure why you would bother going to SSD at all. 30GB can fit comfortably in memory on any database host that you would bother to spec out these days. Use the money you would have spent on an SSD and dump it into more RAM.

# ? May 26, 2010 18:13

soj89: Dec 5, 2005; Kids in China are playing tag with knives, on playgrounds constructed of spinning razorblades and spike traps, because it will make them stronger.

I might be posting in the wrong thread here so forgive me in advance. I'm sure you guys will point me in the right direction though.

I'm currently setting up a new file server/backup system for a small video production company. They're currently using a bunch of USB 2.0 external drives (!!) for production and archival purposes and sneaker-netting them to the two production machines in the office.

The requirements aren't really intensive compared to the "enterprise" level stuff you guys are playing with. I'd like to set up a main file server running Server 2k8 Foundation (I need to build an intranet site that works across the office LAN) and have it host the archival footage and work files. The transfer rates don't have to be high - though they're working with 1080p footage, the plan is to have the workstations pull the footage over the network and do the actual manipulation on the workstation and put the completed products back onto the file server.

They have no backup/disaster recovery plan right now except for duplication of the footage across different external drives which are kept in the office and the original DVPro tapes.

I want to build 3 identical file servers and have 2 in the office and one doing off-site replication from the CEO's home using Crashplan. I'll also recommend they start keeping the original footage and the finished product tapes off-site.

The amount of storage is minimal by enterprise standards (6-10 Tb) and there will only be a maximum of 2 people accessing the server on a heavy basis (the two production workstations). Cost is a big issue. They have an IT staffer on hand who can deal with any issues with the hardware should problems arise.

I'd appreciate recommendations for the server enclosure, raid controller (I'm thinking of RAID 6, 5 or 1+0 for this type of application), and anything else that I've forgotten.

# ? May 31, 2010 17:36

adorai: Nov 2, 2002; 10/27/04 Never forget; Grimey Drawer

soj89 posted:

I want to build 3 identical file servers and have 2 in the office and one doing off-site replication from the CEO's home using Crashplan.

I don't know what crashplan is, but windows server can replicate data all on it's own, transferring only block level changes (and looking for already existing blocks it can copy before putting it on the wire) to keep any number of file shares in sync.

# ? May 31, 2010 19:08

ShizCakes: Jul 16, 2001; BANNED

adorai posted:

I don't know what crashplan is, but windows server can replicate data all on it's own, transferring only block level changes (and looking for already existing blocks it can copy before putting it on the wire) to keep any number of file shares in sync.

Are you referring to DFS here?

# ? May 31, 2010 19:16

soj89: Dec 5, 2005; Kids in China are playing tag with knives, on playgrounds constructed of spinning razorblades and spike traps, because it will make them stronger.

adorai posted:

I don't know what crashplan is, but windows server can replicate data all on it's own, transferring only block level changes (and looking for already existing blocks it can copy before putting it on the wire) to keep any number of file shares in sync.

Something like that. Crashplan is supposed to do an rsync type backup dealie over the LAN or through the internet to a remote server. I'm looking for something simple so that the CEO can operate it himself and not be overwhelmed with command lines and .conf files.

Bottom line: What's the best RAID type to put in place? What about the controller type? The more I've been reading, it seems like Raid 1+0 is preferred over Raid 5 in any case. Would an i7 quad with 8 gb be overkill for the main file server? Especially since it's not particularly i/o intensive.

# ? May 31, 2010 21:40

adorai: Nov 2, 2002; 10/27/04 Never forget; Grimey Drawer

soj89 posted:

Bottom line: What's the best RAID type to put in place? What about the controller type? The more I've been reading, it seems like Raid 1+0 is preferred over Raid 5 in any case. Would an i7 quad with 8 gb be overkill for the main file server? Especially since it's not particularly i/o intensive.

I was referring to DFS, in response to the previous post. As for what RAID level to use, you can get better performance from raid10, however, most large disk arrays will use raid5 or raid6. I would go with raid6 for any array with over 5 disks if IO was secondary to capacity, personally.

# ? Jun 1, 2010 00:03

Vulture Culture: Jul 14, 2003; I was never enjoying it. I only eat it for the nutrients.

adorai posted:

I don't know what crashplan is, but windows server can replicate data all on it's own, transferring only block level changes (and looking for already existing blocks it can copy before putting it on the wire) to keep any number of file shares in sync.

This is more or less true of DFS, but be careful with your chosen nomenclature of "block level." While it's a differential transfer that only transmits changed blocks of the file, it's not block level in the sense that DRBD or SAN replication is block level.

soj89 posted:

Bottom line: What's the best RAID type to put in place? What about the controller type? The more I've been reading, it seems like Raid 1+0 is preferred over Raid 5 in any case. Would an i7 quad with 8 gb be overkill for the main file server? Especially since it's not particularly i/o intensive.

RAID type depends on your workload and your controller. Your controller depends on your requirements and your budget, and often what your vendor actually supports.

These things generally doesn't matter too much performance-wise for small bursts if you have a battery-backed controller, because the controller often has enough cache to soak up the whole write long before it gets put to disk. Wikipedia actually has a really nice writeup on the performance characteristics of different RAID levels:

http://en.wikipedia.org/wiki/Standard_RAID_levels

My tl;dr version:

RAID-0: Generally fastest performance, but you have to be careful sizing your stripe width for highly random workloads or your performance may actually suffer compared to single disk.
RAID-1: Read performance is the same as RAID-0 for random workloads, but can be 50% of RAID-0 for sequential workloads. Synchronous write performance suffers from having to commit each write twice before returning an OK status to the operating system.
RAID-5: Read performance is almost as good as RAID-0. Sequential writes suffer a modest performance penalty, while random writes suffer a substantial performance penalty. For workloads heavy on sequential I/O at large sizes, you want to match the I/O size to the stripe width so it can be serviced by all disks simultaneously. For workloads heavy on random I/O on small sizes, you want to match the I/O size to the segment size on the disk so you don't need to wait on several disks to service one small I/O. Whereas on RAID-0 you can have segment sizes much larger than the I/O size with little/no penalty, it's important on RAID-5 to match them as closely as possible to minimize the number of parity calculations.
RAID-6: Can be as fast as RAID-5 with a good enough controller, but it's generally pretty slow because most RAID-6 controllers seem to use RAID-5 ASICs with firmware enhancements. Typically, they only offload part of the processing to the controller. In optimal circumstances, they should perform identically to RAID-5, except that you lose an additional spindle to parity and should adjust your stripe width accordingly to compensate. For most people, the speed problems are an acceptable compromise, because RAID-5 has some inherent reliability problems when using high capacity disks.

An interesting property of striped RAID arrays is that if you want a normalized stripe width, you need to carefully pick your number of disks, generally to 4N for RAID-0, 4N+1 for RAID-5 and 4N+2 for RAID-6. Generally, performance isn't important enough for you to be anal-retentive to this level.

Vulture Culture fucked around with this message at 04:48 on Jun 1, 2010

# ? Jun 1, 2010 04:06

Vanilla: Feb 24, 2002; Hay guys what's going on in th

soj89 posted:

Bottom line: What's the best RAID type to put in place? What about the controller type? The more I've been reading, it seems like Raid 1+0 is preferred over Raid 5 in any case. Would an i7 quad with 8 gb be overkill for the main file server? Especially since it's not particularly i/o intensive.

If it's a file server (typical 75% read, 25% write) and low IO I would go for RAID 6.

RAID 6 hurts on writes due to the double parity calc but given you are expecting low IO I would take the HA benefits of RAID 6.

Read IO is the same for R1 and R6 if you're looking at the same number of drives.

# ? Jun 1, 2010 07:13

Vulture Culture: Jul 14, 2003; I was never enjoying it. I only eat it for the nutrients.

Vanilla posted:

Read IO is the same for R1 and R6 if you're looking at the same number of drives.

This piece is wrong, wrong, wrong for some I/O profiles, especially as your array encompasses more and more disks.

In a RAID-1 (or RAID-0+1, etc.) array, if you're doing a large sequential read, the heads on both disks in the mirror set are in the same place (and passing over the same blocks), so you get zero speed benefit out of the mirror set even though you're technically reading from both disks at once. Versus RAID-0, your throughput is actually cut in half. For large sequential reads, your performance will almost always be better on an array without duplicated data, because you get to actually utilize more spindles.

With RAID-5/6, you do lose a spindle or two on each stripe (though not always the same spindle like with RAID-4) because you aren't reading real data off of the disks containing the parity information for a given stripe. This implies that for random workloads with a lot of seeks, RAID-0+1 will give you better IOPS.

RAID-5/6 for read throughput, RAID-0+1 for read IOPS.

Vulture Culture fucked around with this message at 07:45 on Jun 1, 2010

# ? Jun 1, 2010 07:41

Vanilla: Feb 24, 2002; Hay guys what's going on in th

Misogynist posted:

This piece is wrong, wrong, wrong for some I/O profiles, especially as your array encompasses more and more disks.

In a RAID-1 (or RAID-0+1, etc.) array, if you're doing a large sequential read, the heads on both disks in the mirror set are in the same place (and passing over the same blocks), so you get zero speed benefit out of the mirror set even though you're technically reading from both disks at once. Versus RAID-0, your throughput is actually cut in half. For large sequential reads, your performance will almost always be better on an array without duplicated data, because you get to actually utilize more spindles.

With RAID-5/6, you do lose a spindle or two on each stripe (though not always the same spindle like with RAID-4) because you aren't reading real data off of the disks containing the parity information for a given stripe. This implies that for random workloads with a lot of seeks, RAID-0+1 will give you better IOPS.

RAID-5/6 for read throughput, RAID-0+1 for read IOPS.

It's a file server, so the IO profile is likely going to be highly random read, cache miss. The caveat being that it may be some kind of specific file server (CAD, Images) dealing with huge file sizes and then a boost in sequential read would be beneficial as you note.

I do disagree that you lose a spindle or two with each stripe, the parity is distributed and does not consume whole drives - all drives participate in the random read operations without penalty and worst case random read IO for a 10k spindle - about 130 IOPS. However, I may be totally wrong here because i'm not factoring in the data stripe itself but have never seen it as a factor is IOPS comparison calcs!

# ? Jun 1, 2010 08:21

Vulture Culture: Jul 14, 2003; I was never enjoying it. I only eat it for the nutrients.

Vanilla posted:

It's a file server, so the IO profile is likely going to be highly random read, cache miss. The caveat being that it may be some kind of specific file server (CAD, Images) dealing with huge file sizes and then a boost in sequential read would be beneficial as you note.

Yeah, it completely depends on how everything is being used -- I definitely want to emphasize that this is for certain workloads, and 99% of the time it really doesn't make a difference at all. I'm an anal-retentive pedant.

Keep in mind, though, that cache hit/miss is also going to be heavily influenced by the behavior of cache prefetch on the array, which can be substantial on large files. Also consider that for small uncached reads, this implies that major portions of the MFT/inode table/whatever probably are going to be cached (though the OS will probably handle plenty of this before the requests hit the storage subsystem again).

Vanilla posted:

I do disagree that you lose a spindle or two with each stripe, the parity is distributed and does not consume whole drives - all drives participate in the random read operations without penalty and worst case random read IO for a 10k spindle - about 130 IOPS. However, I may be totally wrong here because i'm not factoring in the data stripe itself but have never seen it as a factor is IOPS comparison calcs!

That's because you're comparing apples to oranges. You're not going to see it factored in for IOPS, because it doesn't make a difference for IOPS. It does make a difference for streaming throughput, but then, like you said, this is for a file server. Enough concurrent sequential I/Os in different places on disk eventually turn the array's I/O profile random no matter how big the files are.

Determining streaming throughput is very linear -- your factors are the number of spindles, the rotational speed of the spindles, and the areal density of the data on the platters. When you're reading from a spinning disk, and every Nth segment doesn't contain usable data, that knocks your streaming rate down to ((N-1)/N) of the maximum, because you're throwing away a good chunk of the data as garbage. For visual reference, just look at a diagram like this one:

http://www.accs.com/p_and_p/RAID/images/RAID5Segments.gif

Ignore the array as an array, and just focus on how the data is laid out on a single spindle. You have six segments there, and you're only reading data from five of them. It's going to be slower than if you're reading six pieces of data off the same amount of disk.

The reason that this doesn't matter in IOPS calculations is because your main bottleneck becomes how fast you can seek, not how fast you can stream data off the disk, and since the parity disk is idle at that particular moment, it can move the head to where it's needed next and participate in another read.

Anyway, I didn't mean to go off on too big a tangent, because like I said, this is sort of a boundary case that rarely actually matters in real life. The RAID-1+0 sequential read bit is the more important one, because a lot of people don't consider the overlapping head issue and it can mean losing up to half your read throughput.

Vulture Culture fucked around with this message at 09:32 on Jun 1, 2010

# ? Jun 1, 2010 09:18

Vulture Culture: Jul 14, 2003; I was never enjoying it. I only eat it for the nutrients.

Hey, any other IBM guys in here have issues with Midrange Storage Series SANs just hanging [incomplete] remote mirror replication until it's manually suspended and restarted? I had one LUN sitting at 10 hours remaining for about 9 days until I finally just kicked it and I'd like to know if it's a unique problem before I call IBM to open a case.

# ? Jun 8, 2010 19:18

brent78: Jun 23, 2004; I killed your cat, you druggie bitch.

About to pick up 40 TB of Compellent storage. I liked their solution the best out of Lefthand, Equallogic and Netapp. Anything I should be aware of before dropping this PO?

# ? Jun 10, 2010 04:44

ozmunkeh: Feb 28, 2008; hey guys what is happening in this thread

Also very interested to hear from people who have Compellent kit.
Something of a budget freeze here so the project (about 1/5th the size of yours) has been put on hold. I liked the Compellent offering but the best price we could get was 50% more than the comparable Lefthand we were also looking at. The only weird thing was the physical size of the units. As said, we'd be a small installation and the controllers alone take up 6U before you even get around to adding any of the 2U disk trays. That's exactly double the size of the Lefthand setup. Nothing of consequence but strange nonetheless.

I dream of the day I get the email "ozm, go ahead with the Compellent and throw a couple Juniper EX2200 switches in while you're at it". One day.....

# ? Jun 11, 2010 00:00

Nomex: Jul 17, 2002; Flame retarded.

brent78 posted:

About to pick up 40 TB of Compellent storage. I liked their solution the best out of Lefthand, Equallogic and Netapp. Anything I should be aware of before dropping this PO?

You can't control how it does its disk platter tiering. It'll move data around the platters and you can't tell it what data to move or when to move data. (Primary to secondary storage is controllable). It can cause some performance issues.

# ? Jun 11, 2010 00:27

EoRaptor: Sep 13, 2003; by Fluffdaddy

ozmunkeh posted:

... The only weird thing was the physical size of the units. As said, we'd be a small installation and the controllers alone take up 6U before you even get around to adding any of the 2U disk trays....

drat, I wish you guys had asked about this yesterday, I just spent all day at a trade show with some Compellent reps (sales and engineers) available and could have asked them.

For the 6U thing, each controller looks to be a pretty standard 3U chassis (x86 guts). You need two controllers for any redundancy, so 6U. Their website lists setups with only 1 controller, however.

Lefthand integrates storage and controller into one box, so initially consumes less space but has architecture restrictions that mean available space does not scale linearly as you add boxes.

# ? Jun 11, 2010 01:57

Nukelear v.2: Jun 25, 2004; My optional title text

This thread needs a bump.

Building up a smallish HA MSSQL cluster and my old cheap standby MD3000 is definitely looking long in the tooth. So I'm going back to my first love, the HP MSA and I must say the P2000 G3 MSA looks very tempting. Anyone use either the FC or SAS variants of this and have any opinions on it? I've also been reading that small form factor drives are the 'wave of the future' for enterprise storage, logically it seems to be better but I haven't really heard too much about it, so I'm also trying to decide if the SFF variant is the better choice.

# ? Jun 23, 2010 20:40

Vulture Culture: Jul 14, 2003; I was never enjoying it. I only eat it for the nutrients.

Oh man, am I ever regretting agreeing to work on IBM SAN gear. Should have taken the job at the EMC shop instead.

I think "storage manager client deletes a completely different array than the one I clicked" is sort of a bad thing.

# ? Jun 23, 2010 21:07

janitorx: May 3, 2002; I'm cuckoo for cocoa cocks!

Nukelear v.2 posted:

This thread needs a bump.

Building up a smallish HA MSSQL cluster and my old cheap standby MD3000 is definitely looking long in the tooth. So I'm going back to my first love, the HP MSA and I must say the P2000 G3 MSA looks very tempting. Anyone use either the FC or SAS variants of this and have any opinions on it? I've also been reading that small form factor drives are the 'wave of the future' for enterprise storage, logically it seems to be better but I haven't really heard too much about it, so I'm also trying to decide if the SFF variant is the better choice.

The MSA 2000 is pretty good for an entry level SAN, I've installed a number of the G2 FC models and like them a whole lot. The newer models management interface is a drat sight better than the one it originally shipped with as well.

I've never used any of the SFF enclosures, just the full sized ones, but reliability should be the same for both. You also can't get the larger SATA drives in SFF, so it really just depends on your storage needs and budget.

I've seen a few controller failures, but nothing near as bad as the MSA 1500 was.

Also performance monitoring/regular monitoring wise it is a whole lot friendlier, you don't have to load anything on a host to get performance stats and the built in notification via SMTP/SNMP works very well.

# ? Jun 23, 2010 22:31

zapateria: Feb 16, 2003

Speaking of which, how do i monitor performance on a EVA4400 SAN? There are some WMI's but the provider keeps crashing so they're pretty useless. Then there's something called evaperf, a command line utility.. Is that it?

# ? Jun 24, 2010 07:35

Vulture Culture: Jul 14, 2003; I was never enjoying it. I only eat it for the nutrients.

Nukelear v.2 posted:

This thread needs a bump.

Building up a smallish HA MSSQL cluster and my old cheap standby MD3000 is definitely looking long in the tooth. So I'm going back to my first love, the HP MSA and I must say the P2000 G3 MSA looks very tempting. Anyone use either the FC or SAS variants of this and have any opinions on it? I've also been reading that small form factor drives are the 'wave of the future' for enterprise storage, logically it seems to be better but I haven't really heard too much about it, so I'm also trying to decide if the SFF variant is the better choice.

No experience with the HP kit specifically, but there's no reason to fear 2.5" disks -- it's a better form factor for SSDs once those become more price-effective at the enterprise level, and they let you cram a lot more spindles (and a lot more I/O) into the same physical space. Of course, the disks aren't any cheaper than 3.5" ones, and are a bit smaller, so they're a pretty bad option for capacity.

# ? Jun 24, 2010 07:47

Nomex: Jul 17, 2002; Flame retarded.

Nukelear v.2 posted:

This thread needs a bump.

Building up a smallish HA MSSQL cluster and my old cheap standby MD3000 is definitely looking long in the tooth. So I'm going back to my first love, the HP MSA and I must say the P2000 G3 MSA looks very tempting. Anyone use either the FC or SAS variants of this and have any opinions on it? I've also been reading that small form factor drives are the 'wave of the future' for enterprise storage, logically it seems to be better but I haven't really heard too much about it, so I'm also trying to decide if the SFF variant is the better choice.

If you decide to go with the LFF SAS FC option, an EVA4400 starter kit will work out to be almost the same price as an MSA2000FC (possibly cheaper), but offers higher availability, better expandability and better performance.

# ? Jun 24, 2010 14:20

unknown: Nov 16, 2002; Ain't got no stinking title yet!

zapateria posted:

Speaking of which, how do i monitor performance on a EVA4400 SAN? There are some WMI's but the provider keeps crashing so they're pretty useless. Then there's something called evaperf, a command line utility.. Is that it?

Yes - It'll give you access to a huge amount of raw stats. The problem is making something useful of those stats into your monitoring environment. Google can be your friend at that point.

# ? Jun 25, 2010 15:19

Vulture Culture: Jul 14, 2003; I was never enjoying it. I only eat it for the nutrients.

I got to see what happens when an IBM SAN gets unplugged in the middle of production hours today, thanks to a bad controller and a SAN head design that really doesn't work well with narrow racks.

(Nothing, if you plug it right back in. It's battery-backed and completely skips the re-initialization process. Because of this incidental behavior, I still have a job.)

# ? Jun 29, 2010 05:40

Adbot: ADBOT LOVES YOU

# ? Apr 27, 2024 04:12

StabbinHobo: Oct 18, 2002; by Jeffrey of YOSPOS

Misogynist posted:

I got to see what happens when an IBM SAN gets unplugged in the middle of production hours today, thanks to a bad controller and a SAN head design that really doesn't work well with narrow racks.

(Nothing, if you plug it right back in. It's battery-backed and completely skips the re-initialization process. Because of this incidental behavior, I still have a job.)

it turns out there are some good reasons this poo poo costs so much

# ? Jun 29, 2010 06:02

The Something Awful Forums > Discussion > Serious Hardware/Software Crap > Enterprise Storage Megathread: Why is my NAS a SAN?

«‹›207 »