Register a SA Forums Account here!
JOINING THE SA FORUMS WILL REMOVE THIS BIG AD, THE ANNOYING UNDERLINED ADS, AND STUPID INTERSTITIAL ADS!!!

You can: log in, read the tech support FAQ, or request your lost password. This dumb message (and those ads) will appear on every screen until you register! Get rid of this crap by registering your own SA Forums Account and joining roughly 150,000 Goons, for the one-time price of $9.95! We charge money because it costs us money per month for bills, and since we don't believe in showing ads to our users, we try to make the money back through forum registrations.
 
  • Post
  • Reply
wolrah
May 8, 2006
what?

Catch 22 posted:

GET some metrics on you IOPS and MB read/write. THIS is a must for any environment looking to buy ANY SAN

What is the best way to gather this information on a Linux system? I'm usually pretty good at the Google, but all I'm finding is how to use iometer to test maximums rather than how to monitor for a week or two and see how much I actually need. My few Windows systems seem to be the easy part, Performance Monitor is capable of gathering all the information I'll need as far as I've found. The Linux boxes in question are running Debian and Ubuntu FWIW.

Migrating all of our storage needs to a SAN will be the first step in virtualizing and consolidating to hopefully a pair of identical servers rather than the clusterfuck of small boxes we have right now. My testing of ESXi is showing it to be everything I had hoped for and more, so with a recent notable inflow of cash I'm hoping to be able to clean up my infrastructure.

Adbot
ADBOT LOVES YOU

rage-saq
Mar 21, 2001

Thats so ninja...

BonoMan posted:

Yeah we have one coming in next week. Like I said this is just me putting feelers out there to see what other folks use.

It depends on a lot of aspects. A few things to start thinking about now.

1: Recovery Point Objective, what is acceptable data loss, the closer to 0 you get, the more money you spend. 0 data loss is almost cost-exorbitant, and requires a few hundred grand in equipment to achieve and its still not guaranteed.

2: Recovery Time Objective, what is an acceptable downtime. Again, the closer to 0 you get the costs go up exponentially. An RTO of 0 means server redundancy, storage redundancy (two SANs) and application redundancy (example: Microsoft Exchange Continuous Cluster Repication).

3: IOPS. The amount of IOPS you need is going to ultimately determine the expense of the array. If you've got something that needs a sustained random IOPS level of about 2000 IOPS on a database set of 300gb or more, you are talking about getting about 20 disks just for that one application.

BonoMan
Feb 20, 2002

Jade Ear Joe

Catch 22 posted:

:words:

rage-saq posted:

:words:

Awesome thanks!

rage-saq
Mar 21, 2001

Thats so ninja...

Catch 22 posted:

Call EMC, they will come out, do metrics for you, go over everything they think you need, and they will do you right. You can then use this info to compare at other products. Be frank with them and do not think of them as "demoing" a product to you.

You can Even do this with Dell, and they will shove a Equallogic down your throat but they can give info and help you figure out what you need.

This is all free. Just call some vendors.

EMC actually came out onsite and did some performance monitoring to determine IOPS usage patterns before you gave them any money?

I've not actually heard of them doing this, just coming up with random guesstimates for customers based off a little input from the customer. They were horribly wrong (short by about 40% in some cases) that I ended up fixing their order at the last minute before they placed their order (but sometimes after and having to fix it by buying more disks).

Moral of the story: You can't cheat by guessing at IOPS patterns, you really need to know what your usage patterns look like.
Some applications (like Exchange) have decent guidelines, but they are just that, guidelines. I've seen the 'high' Microsoft Exchange estimates be short by about 50% of actual usage, and I've also seen peoples mail systems be 20% under the 'low' guideline.
SQL is impossible to guideline, you need to do a heavy usage case scenario where you record lots of logs to determine what to expect.

Maneki Neko
Oct 27, 2000

rage-saq posted:

EMC actually came out onsite and did some performance monitoring to determine IOPS usage patterns before you gave them any money?

I've not actually heard of them doing this, just coming up with random guesstimates for customers based off a little input from the customer. They were horribly wrong (short by about 40% in some cases) that I ended up fixing their order at the last minute before they placed their order (but sometimes after and having to fix it by buying more disks).

Christ, I was asking general questions about their product line and they wanted to come out and do a complete site performance audit. It was a pain in the rear end to get them to talk to me WITHOUT them doing this first.

Wicaeed
Feb 8, 2005
woot, found a sweet Windows Based iSCSI SAN solution over at
http://www.datacore.com/ They give you a 30 day trial option, so I downloaded that and am messing around with it on my virtual machines and my main pc. It's pretty interesting so far, I've only just whetted my appetite :)

Vanilla
Feb 24, 2002

Hay guys what's going on in th

Catch 22 posted:

But you could use the Cloneing features of a EMC to clone to another LUN and run backups from, making your production LUNs run nicely while backups run if your a 24hour shop. Equallogic would have your backups fight (for the right to par- never mind) for bandwidth and disk access.

Snapview allows for both Clones and Snaps. Same license.

Vanilla
Feb 24, 2002

Hay guys what's going on in th

Mierdaan posted:

How does the Snapview SKU work for the AX4? We didn't have it on our quote, but were assured we had the capability to do snapshots. Does Snapview get you some additional functionality we wouldn't have, or is our reseller just including but not as a line item?

I *think* with the AX4 out of the box you can take one snap of a lun and up to a 16 snapshots per array. No clones.

With the Snapview licnese you get clone support and the limits above are greatly increased.

Vanilla
Feb 24, 2002

Hay guys what's going on in th

BonoMan posted:

I guess I can ask this part of the question as well...what's a good pipeline for backing up from a SAN or NAS?

They want server A to be current work. Server B to be archived work. All of that work backed up to tape for offsite storage and then also have physical backups in house (DVDs or whatever). They would also like a synced offsite server somewhere for fast restoration only.....they really won't pay for jack. :( Or at least not the massive costs it would cost for that.

In this situation people ensure the storage is replicated between both sites.

You then backup at one site and send the tapes to the other, or Iron Mountain. That way you're not backing up over a pipeline.

Vanilla
Feb 24, 2002

Hay guys what's going on in th

rage-saq posted:

EMC actually came out onsite and did some performance monitoring to determine IOPS usage patterns before you gave them any money?

I've not actually heard of them doing this, just coming up with random guesstimates for customers based off a little input from the customer. They were horribly wrong (short by about 40% in some cases) that I ended up fixing their order at the last minute before they placed their order (but sometimes after and having to fix it by buying more disks).

Moral of the story: You can't cheat by guessing at IOPS patterns, you really need to know what your usage patterns look like.
Some applications (like Exchange) have decent guidelines, but they are just that, guidelines. I've seen the 'high' Microsoft Exchange estimates be short by about 50% of actual usage, and I've also seen peoples mail systems be 20% under the 'low' guideline.
SQL is impossible to guideline, you need to do a heavy usage case scenario where you record lots of logs to determine what to expect.

This is accurate. EMC will always step away (read: should always) from performance stuff at the pre-sales stage unless you pay for it or they used to be in delivery. Only you know your environment best and the while EMC will tell you about all the features a good guy should turn to you for spindle counts and not work off capacity estimates.

Vanilla fucked around with this message at 09:05 on Sep 30, 2008

Catch 22
Dec 1, 2003
Damn it, Damn it, Damn it!

rage-saq posted:

EMC actually came out onsite and did some performance monitoring to determine IOPS usage patterns before you gave them any money?

I've not actually heard of them doing this, just coming up with random guesstimates for customers based off a little input from the customer. They were horribly wrong (short by about 40% in some cases) that I ended up fixing their order at the last minute before they placed their order (but sometimes after and having to fix it by buying more disks).

Moral of the story: You can't cheat by guessing at IOPS patterns, you really need to know what your usage patterns look like.
Some applications (like Exchange) have decent guidelines, but they are just that, guidelines. I've seen the 'high' Microsoft Exchange estimates be short by about 50% of actual usage, and I've also seen peoples mail systems be 20% under the 'low' guideline.
SQL is impossible to guideline, you need to do a heavy usage case scenario where you record lots of logs to determine what to expect.

They even made a pretty report with charts and such making the prefmon data look nice. Yes, all for free. I just called them up to ask a quick question about Snapview, and they set everything up.

Mierdaan
Sep 14, 2004

Pillbug

Mierdaan posted:

rage-saq posted:

Additionally, the MSA2000 series is a block-level virtualized storage system, so you don't necessarily have to carve up disk groups based off I/O patterns like you do with traditional disk-stripe arrays.

Can you explain this a little more? I've always felt like I was missing something by carving up disk groups like I did before, so I'm glad to know I am. I just don't understand quite what :)

Reposting this question on a new page :) Is rage-saq just talking about the ability to automatically move more frequently-used data to faster disks? That makes sense if you have slower spindles in your device, but if you cram the whole thing full of 15k RPM SAS drives I'm not quite sure what this accomplishes. Again, totally missing something :(

BonoMan
Feb 20, 2002

Jade Ear Joe

Vanilla posted:

In this situation people ensure the storage is replicated between both sites.

You then backup at one site and send the tapes to the other, or Iron Mountain. That way you're not backing up over a pipeline.

I meant "pipeline" as in "workflow." Not backing up over the internet.

rage-saq
Mar 21, 2001

Thats so ninja...

Mierdaan posted:

Can you explain this a little more? I've always felt like I was missing something by carving up disk groups like I did before, so I'm glad to know I am. I just don't understand quite what :)

Reposting this question on a new page :) Is rage-saq just talking about the ability to automatically move more frequently-used data to faster disks? That makes sense if you have slower spindles in your device, but if you cram the whole thing full of 15k RPM SAS drives I'm not quite sure what this accomplishes. Again, totally missing something :(
[/quote]

Well, there are two basic RAID types used today.
The old way: Disk raid. A 32kb or 64kb or 128kb stripe (sometimes per lun, sometimes not) runs through the entire disk group. Sometimes you make multiple disk groups and group those guys together for performance. Different vendors have different strategies.
What are the downsides? The stripe must be maintained, so performance and reliability follow traditional models and consolidating I/O patterns to particular disk groups (random vs sequential) is of great concern.

The newer way: Block raid. Data is virtualized down to blocks, which are then spread out over the disk group. The advantage is the blocks aren't tied to particular disks, so they can be spread out to maintain redundancy and improve performance as needed. That means you can mix I/O patterns without seeing a performance penalty because the blocks will get spread out more and optimized so more disks are driving the I/O.
It also means it doesn't matter so much that all the disks match the same disk size and spindle speed, but to make your RAID meet sizing and redundancy requirements you won't be able to use the fullest extent of your capacity when you mix spindles.
Not all block storage systems will automatically migrate data over different disk spindle speeds like Compellent and 3Par, that is still fairly new. I know HP is working on including migration into EVAs eventually, but right now its still a best practice to have the same spindle speed per disk group in an EVA. Drobo uses virtualized storage to spread out redundancy blocks everywhere so they can get their crazy recovery thing going.
EMC unfortunately is very pompous and is under the misguided opinion that block virtualization is a bad which is why they don't have it. A lot of industry experts disagree.

Vanilla
Feb 24, 2002

Hay guys what's going on in th

rage-saq posted:


EMC unfortunately is very pompous and is under the misguided opinion that block virtualization is a bad which is why they don't have it. A lot of industry experts disagree.

Gotta comment on this, they do have block virtualization :)

...had it for years now..

Invista

Using Invista you can bring in most arrays, migrate online between them, stripe a volume over three different arrays from three different manufacturers and do all the usual stuff. What Invista does is allows you to pick which arrays you want to be virtualized rather than forcing you down a block based virtualization route on a per-array basis.

What EMC doesn't do is have the blanket virtualisation of a single array. If we look at the EVA it all sounds really great but there are some downsides as I spend my time working with the users of the EVA often, old and new EVA's. Many admins have come from a background of control and having this taken away by an array that wants to decide where to put things is uncomfortable. When you have lots on a box this is not good, especially without any form of dynamic tuning other than making new groups.

Other arrays allow the admin to dictate where to put LUNs, on which exact disks. With the EVA my performance is affected by other apps using the same disk group - people need to give the applications predicable performance levels and this isn't possible in this situation. Only way to guarantee performance levels is to put it all on its own disks group - this is expensive because you need the spares to go with the group, a lot of people are quite happy to share as long as they can pick, choose, limit, and have control if something does need moving.

Smallest disk group is 8 disks. So if I wanted just a small amount dedicated of space for my exchange logs (like 4 disks) i'd have to buy 8 minimum. It's hard enough explaining to procurement why I need to keep exchange on dedicated spindles let alone buy 8 spindles just for for a few GB of logs! The alternative is to let the EVA do whatever and put it with other data but this could drag down the performance of the whole array and is against MS best practice.

Then there is the fact the EVA only supports restriping, not concatination. Painful for applications that are worldwide 24hour, someone in a timezone is going to get crud performance for a few hours.

You seem to know HP quite well rage-saq, let me know if any of my thoughts above are old or inaccurate. I always like to know. I deal with a lot of people wanting to swap their EVA's out just as HP deals with a lot of people looking to swap CX arrays out, these just seem to be some of the common concerns.

What Clariion and DMX arrays do have is Virtual Provisioning. So you can easily make pools and easily add capacity with the same ease you would in an EVA BUT you will have the ability to cordon off drives and resources and to tune the array without penalty. You are essentially picking a part of the array you wish to virtualize. Grab 5TB of ATA, put it in a thin pool and in three clicks you can add capacity and present luns. This isn't exactly block virtualization but you could argue that all arrays have 'virtualized' since day one, you're presenting a 'logical' device in which the array pieces together the LUN based on what you've asked for - the EVA and such are really just taking that one step further and taking control of the entire layout.

It's mostly the accounts I work with who drive my opinion of arrays because I hear it first hand. I have one account who is absoutely chuffed with 3PAR, they love it. They have something else for their high end but they use 3PAR for the fast dirty storage needs. Five minutes and they've provisioned storage. They hate the colour though...and the F5 refresh thing.....

Vanilla fucked around with this message at 17:45 on Sep 30, 2008

lilbean
Oct 2, 2003

Here's a storage question that's unrelated to SANs and NASs - I just upgraded my company's backup system from LTO2 to LTO4 (which is loving awesome I might add). But now I'm stuck with LTO2 tapes from offsite coming back every week and I'm not sure what to do with them. Blanking them is easy enough, but is there somewhere that recycles them or something? Anyone else have to deal with that?

Catch 22
Dec 1, 2003
Damn it, Damn it, Damn it!

lilbean posted:

Here's a storage question that's unrelated to SANs and NASs - I just upgraded my company's backup system from LTO2 to LTO4 (which is loving awesome I might add). But now I'm stuck with LTO2 tapes from offsite coming back every week and I'm not sure what to do with them. Blanking them is easy enough, but is there somewhere that recycles them or something? Anyone else have to deal with that?

IronMountain will shred them. Yes SHRED them. They even shred harddrives. HARDDRIVES!

P.S- I love my LTO4 too, its bitchen fast.

lilbean
Oct 2, 2003

Catch 22 posted:

IronMountain will shred them. Yes SHRED them. They even shred harddrives. HARDDRIVES!
Yeah I know that, but it'd be a waste to shred a couple hundred tapes.

Catch 22
Dec 1, 2003
Damn it, Damn it, Damn it!

lilbean posted:

Yeah I know that, but it'd be a waste to shred a couple hundred tapes.

Shredding is cooler that being green but...
http://www.dell.com/content/topics/segtopic.aspx/dell_recycling?c=us&cs=19&l=en&s=dhs

Looks like a odd site, but hey.
http://www.recycleyourmedia.com/

lilbean
Oct 2, 2003

Catch 22 posted:

Shredding is cooler that being green but...
http://www.dell.com/content/topics/segtopic.aspx/dell_recycling?c=us&cs=19&l=en&s=dhs

Looks like a odd site, but hey.
http://www.recycleyourmedia.com/
Thanks, the second site looks perfect actually. And yeah, shredders rock - like so:
http://www.youtube.com/watch?v=9JL77ECcOoQ

BonoMan
Feb 20, 2002

Jade Ear Joe
Ok on a seperate note we are looking at a MacBookPro for field use with our RED One cameras. We need some serious external storage of course and are looking at things like Caldigit systems and various other RAID arrays.

Any suggestions for portable, but sturdy RAID arrays (preferably eSATA) for use with in the field video editing? At the moment we probably just need around 2 TB.

And holy poo poo that gets expensive.

Mierdaan
Sep 14, 2004

Pillbug
rage-saq, Vanilla, thanks a lot for the explanations. I think I see the light now ;)

Catch 22
Dec 1, 2003
Damn it, Damn it, Damn it!

BonoMan posted:

Ok on a seperate note we are looking at a MacBookPro for field use with our RED One cameras. We need some serious external storage of course and are looking at things like Caldigit systems and various other RAID arrays.

Any suggestions for portable, but sturdy RAID arrays (preferably eSATA) for use with in the field video editing? At the moment we probably just need around 2 TB.

And holy poo poo that gets expensive.

http://www.wdc.com/en/products/products.asp?DriveID=410
It would match your gay rear end Mac. (Sorry) I'm a PC (Hurr)
Maybe? Portable and sturdy are two words that don't really mix with what you are talking about.

Catch 22 fucked around with this message at 20:07 on Sep 30, 2008

Intraveinous
Oct 2, 2001

Legion of Rainy-Day Buddhists
OK, time for my fun time, I've read all 5 pages of the threads and learned quite a bit, and I'll preemptively thank rage-saq for all the help he's given me in the past as well... I apologize in advance for this book, and fully realize that I'm going to need to bring in consultants. I'm just the kind of guy who likes to fully wrap his head around everything before I do that so I can talk semi-intelligently to them and know if they're feeding me a line of BS or cutting corners. What it comes down to is I have to manage all of this day to day, so anything I can do to make myself better at doing that, I try to do.

I work for state government, all of the hardware in our data centers is managed by two people, myself, and a colleague. Both of us were kind of dropped into the "Data Center Admin" position from previous positions as Unix/Linux Sysadmins, and now we're expected to be the jack of all trades for everything from networking to SAN to Windows/Linux/AIX/VMware admin to backups... They decided to put us in this position after years of changing landscapes, and no one person who had any idea of what was going on managed everything. So we wind up with a lot of "I got what I needed for my project at the time" dropped in our lap. I'm trying to get everything into a centralized mode with knowledge of what's coming down the pipe, and planning ahead for it.

90% of what is set up now was the way it is when we started being responsible for it. Currently we have the following SAN related stuff in our primary DC (and you can assume that just about everything is replicated at our backup DC):

HP EVA4000 supporting a 4 node HP Itanium2 cluster running Oracle RAC on RHEL 4.5, mapping two 400GB RAW LUNs for the database, and a few other smaller RAW and ext3fs LUNs for rman, voting, etc.
- These 4 servers each have 2 dual port 2Gig FC HBAs
- The HBAs connect up to two HP 2Gbps FC Switches, one interface from each card to each switch
- We have four connections from the switches to the controllers (HSV200) for the EVA (one connection to each controller from each switch)
- EVA has two disk drawers, each with 14 73GB 15K FC drives (28 total)

The storage system is dedicated ONLY to the Oracle RAC system at this time, but I'd like to get to more truly centralized storage if I can. Currently we don't do any snapshotting/cloning/replication with this sytem, and instead use Oracle Dataguard to push redo logs to our backup DC. Both systems are technically live/production, however. We run the public access from our backup DC and internal access at our main site. If I can do this with some type of delta based replication in the future, that would be OK with me. We currently have 100Mbps to our backup site, with talk of upgrading it to redundant 1Gbps links early next year.

Next we have a Gateway rebranded Xyratex 5412E (I think - I base this on the look and specs from Xyratex's website as I can't get any information from Gateway/MPC on what it actually is and if Xyratex firmware/perf tools/etc will work on it.)
This system has two controllers in the main encloser, each with 2 4Gb FC interfaces, and a second enclosure which is connected by SAS link. The main enclosure has 12 300GB 15K SAS drives, secondary has 6 300GB 15K SAS. System is set to a monolithic array for each enclosure (Array 0 = the 12 300GB, Array 1 = the other 6, and vDisks are split from here)
This system currently supports the following through a pair of 12 port (upgradeable to 16 port) Brocade Silkworm 200E 4Gb FC switches:
- 3 node Lotus Domino/Notes cluster on RHEL 4.6 500GB ext3 LUN for each node 1 single port FC HBA per node
- 2 additional nodes for SMTP and Domino Admin tasks with 100GB LUN for each, 1 single port FC HBA per node
- 2 node VMware ESX 3.5.1 cluster with a total of 6 LUNs - 3 250GB LUNs for VMs, a 30GB for Swap, a 100GB LUN that I use for staging P2V migrations, and a large 150GB LUN for a file server VM. There are currently 12 production VMs, with plans for more in the future. Systems are PE2950 and each has two single port FC HBAs.
- 1 Windows Storage server 2003 box which runs VCB and has a 200GB LUN for backup to disk of a couple of other standalone servers.

I also have a FC native dual drive LTO4 IBM tape library for tape backups.

I'd like to start consolidating the few dozen standalone servers I have to use shared storage as they come up for replacement, but I want to make sure I have a storage system that can support it. Right now, there's only about 4TB of total data at each site, but a document management project for "less paper" is in contract negotiations right now, and will probably need 8-12TB in the first year or two, so expandability is important, even though I know I'm still talking about small potatoes.

I like FC because it's what I've used before and I have some current infrastructure, most importantly the tape libraries, but I realize that I'm going to need some updates to the infrastructure to be able to add 20+ more hosts in the next couple of years.

I guess my questions are:
- Is it feasible to have a high-avail, fairly high volume Oracle DB on the same storage system as the rest of my systems, or should I keep it segregated as it is now?
- If keep it segregated, can my EVA4000 be upgraded to 4Gb FC when I replace the servers later this year?
- Can I have all of my storage systems on a single fabric to allow for over the SAN backup with Netbackup 6.5? (My plan is to keep the Xyratex, use the existing SAS drives as a Disk to Disk to Tape for important data, and add another drawer of SATA drives and make a disk backup staging area for everything else once I get a new central storage system)
- If I have mixed-vendor storage on the same fabric, is that going to cause problems with LUN mappings?
- Am I looking for something that's going to cost a half million dollars to do (I'll need two of whatever I end up with - one for each site)

Thanks in advance for reading this far, and for any insight you can offer!

Catch 22
Dec 1, 2003
Damn it, Damn it, Damn it!

Intraveinous posted:

I guess my questions are:
- Is it feasible to have a high-avail, fairly high volume Oracle DB on the same storage system as the rest of my systems, or should I keep it segregated as it is now?
Yes (soon). Check the Virtualization megathreads last page, read about VMware FT.

Intraveinous
Oct 2, 2001

Legion of Rainy-Day Buddhists

Catch 22 posted:

Yes (soon). Check the Virtualization megathreads last page, read about VMware FT.

While VMware FT sounds great and cool, I have no intentions at all whatsoever to virtualize my Oracle RAC system. It's beastly, and right now the 3 year old, up for replacement later this year system it runs on is Four nodes at each site, each with 4 Itanium2 processors, and 32GB of RAM. I'm thinking I'll probably replace it with a similar number of IBM P/Series boxes with multiple Power 6 chips and 48-64GB of ram per box. Itanium's not my cup of tea, though perhaps if I were on HP-UX instead of RHEL, I'd be in better company. Sadly, I've been told that when it was purchased, Oracle RAC 10g hadn't been certified on the similarly priced/targeted P/Series systems (it was 3 weeks after the PO was cut). All of our other mid-end boxes are P/Series on AIX.

But I digress. What I'm more asking is can I have the LUNs for my Oracle boxen on the same storage system (still on a segregated array, I assume) as my less performance intensive things like File Servers, AD DC VMs, DHCP, support databases, etc for easier management. (One disk system to buy and support rather than two)

Thanks for the pointer back to that thread though, I've not looked at it since before VMworld, so there's been some interesting reading.

Catch 22
Dec 1, 2003
Damn it, Damn it, Damn it!

Intraveinous posted:

But I digress. What I'm more asking is can I have the LUNs for my Oracle boxen on the same storage system (still on a segregated array, I assume) as my less performance intensive things like File Servers, AD DC VMs, DHCP, support databases, etc for easier management. (One disk system to buy and support rather than two)

Thanks for the pointer back to that thread though, I've not looked at it since before VMworld, so there's been some interesting reading.
I don't see why not. With a true LUN you can get dedicated performance as long as the overall load will not max out the particular SAN you go with.

BonoMan
Feb 20, 2002

Jade Ear Joe

Catch 22 posted:

http://www.wdc.com/en/products/products.asp?DriveID=410
It would match your gay rear end Mac. (Sorry) I'm a PC (Hurr)
Maybe? Portable and sturdy are two words that don't really mix with what you are talking about.

Thanks for the heads up. I guess I just meant something that is in a sturdy case.

And yeah we're all PC/Avid here, but the RED's flow well with Macs (at least REDCine does...they have a rough Avid solution that just doesn't cut it at the moment though).

ewiley
Jul 9, 2003

More trash for the trash fire
Hey mid-range storage expert goons! I was wondering if I could get a fact-check from you folks. I've been reading about reports of SAN failures and, well, we just had one :(

Dell/EMC AX150i, dual-SPs, populated with 750gb SATA drives, pretty simple setup (about 5 servers connected). Just allocated the last of my storage to extend our fileservers a few days ago. Suddenly during our normal backup (3am-ish) I get alerts that all of the servers attached to that particular SAN box have dropped their disks.

No pings from management (on either SP), no pings from any iSCSI port. yikes. A reboot of the SAN and re-flash of the FLARE on SPB (which had just kept rebooting itself) and we're back up and running.

So here's my question: Dell's EMC guy tells me that "its a best practice not to put any LUNS on the OS drives' disk pool", which is 4 of my disks (about 1/3 of my total capacity). Apparently since I allocated all the available space on the OS pool, it caused the SAN OS to run out of page space when the IOs went up during our backups.

So am I seriously not supposed to store anything on the ~2TB of OS pool disk just because the SAN OS needs it for a pagefile!? This seems insane to me and a horribly bad design. Anyone else have this experience or any insight into this?

Catch 22
Dec 1, 2003
Damn it, Damn it, Damn it!

ewiley posted:

Hey mid-range storage expert goons! I was wondering if I could get a fact-check from you folks. I've been reading about reports of SAN failures and, well, we just had one :(

Dell/EMC AX150i, dual-SPs, populated with 750gb SATA drives, pretty simple setup (about 5 servers connected). Just allocated the last of my storage to extend our fileservers a few days ago. Suddenly during our normal backup (3am-ish) I get alerts that all of the servers attached to that particular SAN box have dropped their disks.

No pings from management (on either SP), no pings from any iSCSI port. yikes. A reboot of the SAN and re-flash of the FLARE on SPB (which had just kept rebooting itself) and we're back up and running.

So here's my question: Dell's EMC guy tells me that "its a best practice not to put any LUNS on the OS drives' disk pool", which is 4 of my disks (about 1/3 of my total capacity). Apparently since I allocated all the available space on the OS pool, it caused the SAN OS to run out of page space when the IOs went up during our backups.

So am I seriously not supposed to store anything on the ~2TB of OS pool disk just because the SAN OS needs it for a pagefile!? This seems insane to me and a horribly bad design. Anyone else have this experience or any insight into this?
You can store data on the OS LUN, but its best not to.

That's why you carve up a SAN. The 4 dedicated OS drives you buy need to be small and fast. 36GB 15K. It sounds like your storage needs pushed you to a AX4 but you cheaped out with a non-expandable AX150. (that's ok, but something to think about)

Also, you said 2TB? You have your 4 disk OS array without a hot spare? It should be raid 5 with a HS. This leaves you with a ~3TB LUN (depending on your RAID level) on your other drives running with 2 HS of their own. SANs don't fail alot, you just have to build them out right.

To clear that up:
12 drive total
4 x 750 RAID 5 (3 disks) with 1 HS
8 x 750 with 2 HS= 6 disks, RAID 10 gives 2.2TB, RAID 6 or 50 gives ~3TB.

EMC has more usable space than any other vendor as a product line whole (in the grand schema, but I think the AX150 shoots it's self in the foot). Its not bad design, its how they made the product. Every SAN vendors product has drawbacks and strengths, you need to find the right SAN that gives you what you need, has the things you can live without. There is no "silver bullet" SAN yet, because the needs of each place is different.
Many people look at me crazy when I say I like RAID 10 (RAID 1,0) but I like my databases to have that little bit of "oomph" when they r/w from my LUNs. Expensive, yes. Do I lose usable drive space, yes, but its what fits my environment.

Catch 22 fucked around with this message at 15:37 on Oct 3, 2008

ewiley
Jul 9, 2003

More trash for the trash fire

Catch 22 posted:

You can store data on the OS LUN, but its best not to.

That's why you carve up a SAN. The 4 dedicated OS drives you buy need to be small and fast. 36GB 15K. It sounds like your storage needs pushed you to a AX4 but you cheaped out with a non-expandable AX150. (that's ok, but something to think about)

Yes, the SAN was supposed to be for one set of servers, but it got expanded to include others that probably would have been better served by something like shared SCSI for some of the use (2-node clusters). I guess I was expecting the SAN to have some kind of internal disk for the SPs, this was bad research on my part.

Me: "sure we have all this unused disk space!" :v:

Live and learn, fortunately we can work around this, it just sucks having all that unused space.

Catch 22 posted:


Also, you said 2TB? You have your 4 disk OS array without a hot spare? It should be raid 5 with a HS.

Yes we have a hotspare, but it's not assigned to a particular pool (it doesn't give me the option, it seems the 150 just has a shared hotspare for all pools)

Thanks again catch-22! I appreciate the response.

Catch 22
Dec 1, 2003
Damn it, Damn it, Damn it!

ewiley posted:

Live and learn, fortunately we can work around this, it just sucks having all that unused space.

So true. I really wish someone could come along with a super flexible SAN design that would address everyone needs, even if it was a'la carte, and not corn you in some way or another.

H110Hawk
Dec 28, 2006

Catch 22 posted:

You can store data on the OS LUN, but its best not to.

That's why you carve up a SAN. The 4 dedicated OS drives you buy need to be small and fast. 36GB 15K.

Coming from the world of netapp, this seems ridiculous to me. On a single tray why would I give up 4 disks to OS? What is it even paging, and why doesn't it simply pre-allocate what it needs to manage your system? Doesn't a SAN OS know everything about itself when you initialize it?

That stinks of horrible design flaw. Am I missing something obvious here besides "buy more spindles?"

Catch 22
Dec 1, 2003
Damn it, Damn it, Damn it!

H110Hawk posted:

"buy more spindles?"
Winner Winner Chicken Dinner, I would say, but he cant with the AX150.

NetApp would make its own issues once the LUN is filled up though too (more so with WAFL), making the answer the same if he wanted to keep performance high. This is my point, every SAN has a drawback, and most of the time the answer is buy more space, expand to another shelf/DAE, etc.

This all depends on what your issue is of course.

Catch 22 fucked around with this message at 18:30 on Oct 3, 2008

paperchaseguy
Feb 21, 2002

THEY'RE GONNA SAY NO
The clariion has a certain amount of space reserved for the OS on the first five disks. Any page file would be contained in that space and is fixed in size. Since these are slow disks in a slow clariion, the large increase in I/O on those disks may have slowed down the OS to the point where it couldn't cope. You should put your lowest performance applications on the first five disks.

Catch 22
Dec 1, 2003
Damn it, Damn it, Damn it!

paperchaseguy posted:

The clariion has a certain amount of space reserved for the OS on the first five disks.
5 or 4 disks? Your talking about the CX line.

The AX line can be 3 or 4 disks, depending on how many SP you have.

Catch 22 fucked around with this message at 20:53 on Oct 3, 2008

H110Hawk
Dec 28, 2006

Catch 22 posted:

Winner Winner Chicken Dinner, I would say, but he cant with the AX150.

My main point is the AX150 seems like a pretty poorly designed solution if I have to burn 25% of my available spindles for what should be some paltry OS space consumption. What the heck are they storing on those disks?

Catch 22
Dec 1, 2003
Damn it, Damn it, Damn it!

H110Hawk posted:

My main point is the AX150 seems like a pretty poorly designed solution if I have to burn 25% of my available spindles for what should be some paltry OS space consumption. What the heck are they storing on those disks?
That paltry OS is rather large depending on the SAN series you pick. The CX line referenced in the above post requires 62GB.
It may not be that the AX150 is poorly designed but rather that it was not the right SAN for the job. Its the cheapest/bottom of the line SAN by EMC (Dell only I think) so its no surprise that the design is not up to par with others. I can configure one for less than 10K. Do you think it would be the SAN to end all SANs?

CIO's and Storage Admins (Sys admins alike) all need to realize that this is more than just the design of the first shelf. When you look at a SAN you need to look at expansion (as well as so many other things) and if the EMCs use of the first few drives is unacceptable for you, look at another SAN, but if you plan to build on that first shelf, EMC will come out on top will the most overall USABLE storage per drive over any other SAN provider as of now.

It's not design "flaw", it's trade off.


To Add: Im not saying EMC is the best SAN out there, they just have been doing it longer and better than most in a overall aspect. They DO have drawbacks as well, case in point that someone already brought up was the that they don't move to block level access. Another is that the licenses don't transfer (nether do Netapps though)

Catch 22 fucked around with this message at 21:17 on Oct 3, 2008

H110Hawk
Dec 28, 2006

Catch 22 posted:

That paltry OS is rather large depending on the SAN series you pick. The CX line referenced in the above post requires 62GB.

That seems excessive. :)

To diverge from the thrilling debate about some Dell bottom of the barrel disk enclosure, why am I seeing a shitton of ECC (and similar) errors on my filers in the past week?

In the past 7 days I've had transient ECC errors, a watchdog reset, and two filers with NMI's being shot off the PCI bus. We also had our Cisco 6509 register:

code:
Sep 30 21:51:42 PDT: %SYSTEM_CONTROLLER-SP-3-ERROR: Error condition detected: TM_NPP_PARITY_ERROR
Sep 30 21:51:42 PDT: %SYSTEM_CONTROLLER-SP-3-EXCESSIVE_RESET: System Controller is getting reset so frequently
I would chalk it up to power, but this is at three different datacenter locations, with three different power feeds, one of which is many miles away. We've also had countless webservers and stuff just arbitrarily falling over. Has anyone else been having a "when it rains it pours" week with really random errors?

At this point I'm blaming bogons and the LHC.

Adbot
ADBOT LOVES YOU

Catch 22
Dec 1, 2003
Damn it, Damn it, Damn it!

H110Hawk posted:

At this point I'm blaming bogons and the LHC.
Your Netapps are out to get you if you don't feed them more disks and power.

  • 1
  • 2
  • 3
  • 4
  • 5
  • Post
  • Reply