Enterprise Storage Megathread: Why is my NAS a SAN?

The Something Awful Forums > Discussion > Serious Hardware/Software Crap > Enterprise Storage Megathread: Why is my NAS a SAN?

«‹›207 »

crazyfish: Sep 19, 2002

EnergizerFellow posted:

16TB (minus 4 KB?) is a hard limit under the Microsoft iSCSI initiator, regardless of x86 or x64.

So I'd be able to sidestep the issue with a separate HBA? I think we have a Qlogic HBA laying around somewhere that I could try.

# ? Oct 30, 2009 22:04

Adbot: ADBOT LOVES YOU

# ? Apr 24, 2024 13:36

EnergizerFellow: Oct 11, 2005; More drunk than a barrel of monkeys

crazyfish posted:

So I'd be able to sidestep the issue with a separate HBA? I think we have a Qlogic HBA laying around somewhere that I could try.

I theory, that should do it. I assume you're creating the NTFS filesystem by hand from the CLI with 8 KB or 16 KB clusters?

As for some limits information straight from Microsoft:
http://support.microsoft.com/kb/140365
http://www.microsoft.com/whdc/device/storage/GPT_FAQ.mspx

Any technical reason for a filesystem this large? Dealing with an unclean mount will be a bear.

# ? Oct 30, 2009 22:46

crazyfish: Sep 19, 2002

EnergizerFellow posted:

I theory, that should do it. I assume you're creating the NTFS filesystem by hand from the CLI with 8 KB or 16 KB clusters?

As for some limits information straight from Microsoft:
http://support.microsoft.com/kb/140365
http://www.microsoft.com/whdc/device/storage/GPT_FAQ.mspx

Any technical reason for a filesystem this large? Dealing with an unclean mount will be a bear.

Doing it by hand from the CLI isn't necessary because the GUI has options for all cluster sizes.

In any case, we found a solution by creating a spanning volume of several ~16TB LUNs before, my question was primarily out of curiosity and seeing if we can make the management portion a bit easier.

The reason for the big filesystem is integration with an application that will only take a single path for its data, which also refuses to work with a network drive so any kind of NAS type solution won't work.

# ? Oct 31, 2009 00:56

paperchaseguy: Feb 21, 2002; THEY'RE GONNA SAY NO

crazyfish posted:

Doing it by hand from the CLI isn't necessary because the GUI has options for all cluster sizes.

In any case, we found a solution by creating a spanning volume of several ~16TB LUNs before, my question was primarily out of curiosity and seeing if we can make the management portion a bit easier.

The reason for the big filesystem is integration with an application that will only take a single path for its data, which also refuses to work with a network drive so any kind of NAS type solution won't work.

let me save you from wanting to kill yourself when this thing takes 10 hours to chkdsk: use mount points if at all possible and keep your partitions to 1TB.

# ? Oct 31, 2009 15:03

shablamoid: Feb 28, 2006; Shuh-blam-oid

Does anyone have any experience with Open-E's DSS v6?

# ? Oct 31, 2009 22:25

adorai: Nov 2, 2002; 10/27/04 Never forget; Grimey Drawer

shablamoid posted:

Does anyone have any experience with Open-E's DSS v6?

Seems like a waste of money. if you are going to roll a software solution on commodity hardware, there are plenty of other solutions out there, specifically Opensolaris.

# ? Oct 31, 2009 23:06

Erwin: Feb 17, 2006

Anybody familiar with Scale Computing? They caught our eye because they were originally in the financial market, which is what we do (even though they failed at it). Their system starts with 3 1U units, either 1TB, 2TB, or 4TB per unit, and then you can expand 1U at a time, mixing and matching pur-U capacities.

# ? Nov 11, 2009 17:08

rage-saq: Mar 21, 2001; Thats so ninja...

Erwin posted:

Anybody familiar with Scale Computing? They caught our eye because they were originally in the financial market, which is what we do (even though they failed at it). Their system starts with 3 1U units, either 1TB, 2TB, or 4TB per unit, and then you can expand 1U at a time, mixing and matching pur-U capacities.

Skip it. IMO the only clustered (as opposed to traditional aggregate arrays like HP EVA, EMC CX, netapp etc) systems worth getting into are LeftHand or some kind of Solaris ZFS system.

# ? Nov 11, 2009 20:14

Halo_4am: Sep 25, 2003; Code Zombie

Misogynist posted:

Isilon is really big in this space. They mostly deal with data warehousing and near-line storage for multimedia companies and high-performance computing. They're very competitive on price for orders of this size, but I can't speak yet for the reliability of their product.

I haven't gotten to play with ours yet; it's sitting in boxes in the corner of the datacenter.

Been running 2 clusters for about 4 years now. In that time we've seen a few single drive failures, a node failure, and a back-end infiniband switch failure. In all cases the cluster stayed up and running and performing. Their support used to be just absolutely top shelf, to where they would call YOU if the cluster had an issue. These days they've calmed down on support, but when I call it still goes direct to Seattle where I'm connected after only a few minutes of hold time (if any) and I can expect a resolution to simple problems within a few days at most.

Most of the problems I've had with the cluster has been due to the single share style they use. Nothing they're doing wrong... it's just we've had an app here and there choke upon seeing a single 60TB chunk of storage. Over time that problem has resolved itself in app updates and in-house development. It is something to consider though if you're in the market for 600-800TB of storage, Isilon will handle it and do it well... but the front-end applications you're intending to use may have a fit.

Erwin posted:

Anybody familiar with Scale Computing? They caught our eye because they were originally in the financial market, which is what we do (even though they failed at it). Their system starts with 3 1U units, either 1TB, 2TB, or 4TB per unit, and then you can expand 1U at a time, mixing and matching pur-U capacities.

This is what Isilon does too. Not saying to skip scalecomputing, but I've never heard or worked with them. If you look that way it couldn't hurt to engage Isilon sales as well.

Halo_4am fucked around with this message at 20:53 on Nov 11, 2009

# ? Nov 11, 2009 20:51

Erwin: Feb 17, 2006

Erwin posted:

Anybody familiar with Scale Computing? They caught our eye because they were originally in the financial market, which is what we do (even though they failed at it). Their system starts with 3 1U units, either 1TB, 2TB, or 4TB per unit, and then you can expand 1U at a time, mixing and matching pur-U capacities.

Perhaps it would be more useful for me to explain my needs

We're a small company (20 users) and we currently have about 2.5TB of data spread across an EMC AX-100 and local storage. We'd like to replace the AX-100 with something that starts at around 6TB and is greatly and cheaply expandable (there's a very good change that our storage needs will jump to 50TB in a year or two). We don't like the EMC because EMC was quick to end-of-life it and we're bitter that we can just put 1TB drives in it and call it a day. Scale Computing/Isilon was attractive because it's so expandable and doesn't require too much management, but Scale Computing is very small and that's scary, and Isilon was too expensive (the sweet spot to start at was 18TB, and that was 40 grand from Isilon).

I'll try to get an idea of cost from LeftHand, but is there anything else that may work for us better?

# ? Nov 13, 2009 18:41

Vulture Culture: Jul 14, 2003; I was never enjoying it. I only eat it for the nutrients.

rage-saq posted:

Skip it. IMO the only clustered (as opposed to traditional aggregate arrays like HP EVA, EMC CX, netapp etc) systems worth getting into are LeftHand or some kind of Solaris ZFS system.

Edit: Beaten to the punch on Isilon

# ? Nov 13, 2009 20:15

bmoyles: Feb 15, 2002; United Neckbeard Foundation of America; Fun Shoe

At my previous gig, we had 2 Isilon clusters. Great product. Their replication product over a WAN wasn't great, but for regular day-to-day use, they were fast, reliable, super easy to scale, and well-supported (aside from some hiccups with the aforementioned synciq stuff). I will likely give them another go if I find an application for it.

For a base cluster of 3000i's with redundant Infiniband, I think you'd be looking at 100-130k if I recall correctly. Pricing may have changed, and that doesn't necessarily reflect what you'd get price-wise.

LeftHand and EqualLogic will do you well, too, but note, those are block-based iSCSI products, and Isilon is file-based NAS storage. Different applications.

# ? Nov 14, 2009 02:34

three: Aug 9, 2007; i fantasize about ndamukong suh licking my doodoo hole

We have a couple of Equallogic PS6000E's. The main device is holding our Oracle databases. We want to replicate the main SAN essentially across the country, to the other PS6000E for disaster recovery purposes.

There's not integration with the Oracle DB to prevent corruption in the DB yet, is there?

I'm still in the researching phase of this, but I was just wondering if anyone had any tips, and/or things to avoid when trying to do it? :shobon:

# ? Dec 21, 2009 02:50

Got KarmA?: Sep 11, 2001; Life is a game. Have fun losing!

Oracle will handle the replication for you. Don't do it at the SAN level. I'm not sure any kind storage level replication technology will net you anything better than crash-equivalent data.

http://www.oracle.com/technology/products/dataint/index.html

# ? Dec 21, 2009 04:37

Maneki Neko: Oct 27, 2000

Got KarmA? posted:

I'm not sure any kind storage level replication technology will net you anything better than crash-equivalent data.

NetApp should have this covered.

# ? Dec 21, 2009 06:45

1000101: May 14, 2003; BIRTHDAY BIRTHDAY BIRTHDAY BIRTHDAY BIRTHDAY BIRTHDAY FRUITCAKE!

Got KarmA? posted:

Oracle will handle the replication for you. Don't do it at the SAN level. I'm not sure any kind storage level replication technology will net you anything better than crash-equivalent data.

http://www.oracle.com/technology/products/dataint/index.html

EMC Recoverpoint, and potentially NetApp Snapmirror can both give you application consistent replication.

Depending on distance between endpoints, RPO, and latency you may need to live with crash consistent replication for business continuity/disaster recovery.

How does oracle's built in replication handle it? Does it wait for the other side to acknowledge a write before committing it to logs at the local site? How would say 70ms of latency impact the performance of the production database?

# ? Dec 22, 2009 04:25

rage-saq: Mar 21, 2001; Thats so ninja...

1000101 posted:

EMC Recoverpoint, and potentially NetApp Snapmirror can both give you application consistent replication.

Depending on distance between endpoints, RPO, and latency you may need to live with crash consistent replication for business continuity/disaster recovery.

How does oracle's built in replication handle it? Does it wait for the other side to acknowledge a write before committing it to logs at the local site? How would say 70ms of latency impact the performance of the production database?

I'm pretty sure synchronous replication (where it ACKs a mirrored write before continuing on the source side) and 70ms of latency would destroy nearly any production database performance. At a certain point you can't overcome latency.

# ? Dec 22, 2009 06:59

three: Aug 9, 2007; i fantasize about ndamukong suh licking my doodoo hole

rage-saq posted:

I'm pretty sure synchronous replication (where it ACKs a mirrored write before continuing on the source side) and 70ms of latency would destroy nearly any production database performance. At a certain point you can't overcome latency.

Hrms. What would you suggest as the best route to copy the database off-site to another SAN? We actually just want a snapshot every hour or so, not live replication. I'm very new to Oracle, but have good MySQL experience, and to a lesser extent MSSQL experience. Could we just do similar to a dump and import like would be done with MySQL? Although that wouldn't be optimal since we'd have to lock the entire DB to dump it, no?

They bought these equallogic units before I started here because the Dell rep convinced them syncing Oracle would be as simple as replicating using the equallogic interface. :sigh:

# ? Dec 22, 2009 14:26

H110Hawk: Dec 28, 2006

three posted:

They bought these equallogic units before I started here because the Dell rep convinced them syncing Oracle would be as simple as replicating using the equallogic interface.

Call up that dell rep who made promises and get him to tell you how. Withhold payment if that is still possible. If you just need hourly crash-equivalent data because your application is fully transaction aware then it shouldn't be that bad. Play hardball with him. Have him send you a unit which works if this one doesn't, and send back the old one. "You promised me X gigabytes of storage with the ability to make a consistent Oracle snapshot for $N."

In theory you can make a backup based on a transaction view without locking your entire database for writes, but I have never used Oracle.

# ? Dec 22, 2009 19:55

three: Aug 9, 2007; i fantasize about ndamukong suh licking my doodoo hole

H110Hawk posted:

Call up that dell rep who made promises and get him to tell you how. Withhold payment if that is still possible. If you just need hourly crash-equivalent data because your application is fully transaction aware then it shouldn't be that bad. Play hardball with him. Have him send you a unit which works if this one doesn't, and send back the old one. "You promised me X gigabytes of storage with the ability to make a consistent Oracle snapshot for $N."

In theory you can make a backup based on a transaction view without locking your entire database for writes, but I have never used Oracle.

While this is what I would do if I was in charge, I doubt the regional manager is going to send these back. We pretty much have to find a way to make this work.

I don't think crash-equivalent data is what they would want, but its a good point if all else fails. Like you, I have 0 experience with Oracle. Time to try to make some miracles happen.

# ? Dec 22, 2009 20:50

1000101: May 14, 2003; BIRTHDAY BIRTHDAY BIRTHDAY BIRTHDAY BIRTHDAY BIRTHDAY FRUITCAKE!

rage-saq posted:

I'm pretty sure synchronous replication (where it ACKs a mirrored write before continuing on the source side) and 70ms of latency would destroy nearly any production database performance. At a certain point you can't overcome latency.

Clearly this is the case but I'm not sure how Oracle handles it. Is Oracle going to acknowledge every write? If so then yeah 70ms would make the DBA want to blow its brains out. Or does it just ship logs on some kind of interval? More of an Oracle question than a storage question but it would be helpful in deciding if you should use array based replication or application based.

Typically when we're doing offsite replication and the RTT is >10ms we tend to use async replication but it's often crash consistent. Exceptions are when you use tools like SnapManager to do snapmirror updates as part of your DR process. It's a larger RPO but you're going to be application consistent on the other side.

quote:

Hrms. What would you suggest as the best route to copy the database off-site to another SAN? We actually just want a snapshot every hour or so, not live replication. I'm very new to Oracle, but have good MySQL experience, and to a lesser extent MSSQL experience. Could we just do similar to a dump and import like would be done with MySQL? Although that wouldn't be optimal since we'd have to lock the entire DB to dump it, no?

Knowing little about Oracle what I might do is something like this on an hourly schedule:

1. Place oracle in hot backup mode
2. Dump an oracle DB backup (if feasible, may not be depending on DB size)
3. Snapshot array
4. Take oracle out of hot backup mode
5. Replicate recent snapshot offsite.

Step 2 may be completely redundant though. This is not unlike how something like NetApp Snapmirror works (kick off exchange VSS writers, kick off array snapshot, turn off VSS writers and tell the array to update snapmirror which sends the recent snap offsite.)

Bandwidth requirement is basically whatever it takes to replicate the difference between each snapshot. So if you're ready heavy you could probably use less than 128kb or if you're write heavy it could get pretty insane. It is definitely something to keep an eye on.

# ? Dec 23, 2009 01:39

rage-saq: Mar 21, 2001; Thats so ninja...

three posted:

Hrms. What would you suggest as the best route to copy the database off-site to another SAN? We actually just want a snapshot every hour or so, not live replication. I'm very new to Oracle, but have good MySQL experience, and to a lesser extent MSSQL experience. Could we just do similar to a dump and import like would be done with MySQL? Although that wouldn't be optimal since we'd have to lock the entire DB to dump it, no?

They bought these equallogic units before I started here because the Dell rep convinced them syncing Oracle would be as simple as replicating using the equallogic interface.

This kind of advanced high availability is not as simple as many sales people would like it to sound. Some options like application specific mirroring (I'm not an Oracle pro, but I know MS-SQL 2005/2008 have both sync and async database mirroring with failover) would allow you some technical avenues towards improving the situation.

If the people who bought it weren't smart enough to know what they were asking for, or didn't know enough to ask the right questions and not have the wool pulled over their head, then theres not much you can do from a business/customer service standpoint. However, if the RFP/requirements sheet/whatever was specific enough it called for these specific features, and you didn't get them due to whatever kind of fuckup then you should take some hints from comments people have already made and make DELL fix this.
As people have already mentioned, find out WHO made the promises, get emails of them if possible, and hold them to it. If you are a vendor/consultant/etc and you make claims about XYZ with vanilla icecream on the side, and it doesn't do any of that, you can be sure as hell you can LEGALLY get out of paying agreed upon price. Thats how this industry works.

If you have already paid it they would be REALLY REALLY DUMB for them to not do EVERYTHING they can possibly do to make it better. These kinds of empty promises and bad service followup is every competitors and lawyers dream situation and every product/service/sales managers worst nightmare.

I'm a consultant and you better bet this is one of the highest things on my list of "Things Not To Do". I do everything I can to make sure I understand the technical requirements, and make sure the customer understands the differences and why its important these things are defined and not glossed over. If I promise X solution and can only deliver Y because I hosed up for whatever reason, you better bet my boss is going to chew my rear end out because he has to pay to fix it and I may have to be working on a new resume sooner than I had wanted.

I will say I'm not surprised, DELL internal equallogic sales reps are notoriously clueless (getting someone barely qualified for sales, and barely qualified for basic end user technical support doing advanced enterprise design is their SOP) and I've had many customers buy the heaping load of bullshit they get fed by somebody who clearly doesn't understand what they are talking about only to get disappointed later.
Great specific examples of this are when they get sold a single SATA shelf and are promised 50k IOPS because that is what the maximum value on the specs sheet says. Meanwhile the sales rep says its going to meet the performance requirements for a write heavy SQL database for this and god knows what other reason. If you don't press to get real technical knowledge they don't throw someone qualified at you.

1000101 posted:

Clearly this is the case but I'm not sure how Oracle handles it. Is Oracle going to acknowledge every write? If so then yeah 70ms would make the DBA want to blow its brains out. Or does it just ship logs on some kind of interval? More of an Oracle question than a storage question but it would be helpful in deciding if you should use array based replication or application based.

I suppose thats the real question of *WHERE* you are doing your replication and what your application supports. If Oracle doesn't care how it got the database so long as it was crash consistent with the log files you could get away with some pretty dirty array based replication techniques that do a synchronous replication that only amounts to *COPY BLOCKX, WAIT FOR CONFIRMATION ON BLOCK X BEING WRITTEN BEFORE SENDING BLOCK Y* as opposed to some more advanced things like array agents for oracle or oracle itself doing the replication.

Different techniques allow for different options but at a fundamental level you can't have *true* synchronous replication that doesn't hold up processing on the source end until the target matches the source. Otherwise its not no-data-lost-crash-consistent which then defeats the point of synchronous replication and that kind of scenario where something like 70ms delay would really kill you. Its just the nature of the beast.

# ? Dec 23, 2009 08:09

H110Hawk: Dec 28, 2006

three posted:

While this is what I would do if I was in charge, I doubt the regional manager is going to send these back. We pretty much have to find a way to make this work.

rage-saq posted:

As people have already mentioned, find out WHO made the promises, get emails of them if possible, and hold them to it. If you are a vendor/consultant/etc and you make claims about XYZ with vanilla icecream on the side, and it doesn't do any of that, you can be sure as hell you can LEGALLY get out of paying agreed upon price. Thats how this industry works.

Your manager didn't get to be your manager by sitting in the back seat when problems happened. Only you know your companies political culture, but now is the time to step up and show your competence. Present the technical solutions which this machine can give you. Back it all up with documentation. Demand to speak with the Dell rep about it if your boss finds them lacking. Speak to that reps boss the moment he sounds stupid. Half the time it only takes a little bit of saber rattling to get the job done. They can forward ship you the correct unit for free if this one doesn't meet the requirements. Pray that everything was put together in a technical document which was sent to the rep.

If you're simply stuck with the unit, no contact with the rep, and a pussy of a boss who won't fix it, document everything. Keep copies at home. If they come to you with "Why doesn't this work!" show them where you documented during setup how it wouldn't work, notified your boss, attempted to make it right, and implemented the best available solution.

The next step, if the technical requirements weren't properly stated to the rep, is for your department to have a technical representative at all sales meetings. This includes the lunch/dinner/perk junkets the sales rep will drag your boss along on. Again, only you know how sharp you stay at certain levels of intoxication, don't go past them. If you can't stop yourself after a beer or two then don't drink at all. If your boss is stupid and won't let you be present, then at least have them send along a proposal signed off by you.

Gun for a promotion. Figure it out. Solve problems. Don't be a whiny bitch. Don't be a pushover. There, I said it, now you go do it.

# ? Dec 23, 2009 18:24

adorai: Nov 2, 2002; 10/27/04 Never forget; Grimey Drawer

On our NetApp 3140, when running the command: priv set diag; stats show lun; priv set admin

I am seeing a large percentage of partial writes. There does not seem to be a corresponding number of misaligned writes, as you can see in the below output:

lun:/vol/iscsivol0/lun0-W-OMCoT2A9Iw:write_align_histo.0:72%
lun:/vol/iscsivol0/lun0-W-OMCoT2A9Iw:write_align_histo.1:0%
lun:/vol/iscsivol0/lun0-W-OMCoT2A9Iw:write_align_histo.2:0%
lun:/vol/iscsivol0/lun0-W-OMCoT2A9Iw:write_align_histo.3:0%
lun:/vol/iscsivol0/lun0-W-OMCoT2A9Iw:write_align_histo.4:0%
lun:/vol/iscsivol0/lun0-W-OMCoT2A9Iw:write_align_histo.5:0%
lun:/vol/iscsivol0/lun0-W-OMCoT2A9Iw:write_align_histo.6:0%
lun:/vol/iscsivol0/lun0-W-OMCoT2A9Iw:write_align_histo.7:0%
lun:/vol/iscsivol0/lun0-W-OMCoT2A9Iw:read_partial_blocks:0%
lun:/vol/iscsivol0/lun0-W-OMCoT2A9Iw:write_partial_blocks:27%

Should I be worried about this? All of my vmdks are aligned (i went through every single one to double check), plus there are no writes to .1-.7 so the evidence also shows no alignment issue. I submitted a case to have an engineer verify there is no issue, but I was wondering if anyone else has seen partial writes like this on a VMware iSCSI lun. The partial writes typically hover around 30%, but can vary. They were at 75% at one point today.

# ? Dec 24, 2009 03:45

three: Aug 9, 2007; i fantasize about ndamukong suh licking my doodoo hole

H110Hawk posted:

After discussion, I think it was more of that my manager thought it would work... and it likely won't; I don't believe the Dell rep ever said specifically it would work with Oracle.

He's not upset about it, and it isn't that big of a deal... we'll just use it for something else if we can't get it to work for this specific project. I'm going to look into the hot backup suggestion 1000101 provided, seems like a good idea.

Thanks for the suggestions, though.

three fucked around with this message at 04:18 on Dec 24, 2009

# ? Dec 24, 2009 04:15

1000101: May 14, 2003; BIRTHDAY BIRTHDAY BIRTHDAY BIRTHDAY BIRTHDAY BIRTHDAY FRUITCAKE!

adorai posted:

On our NetApp 3140, when running the command: priv set diag; stats show lun; priv set admin

I am seeing a large percentage of partial writes. There does not seem to be a corresponding number of misaligned writes, as you can see in the below output:

lun:/vol/iscsivol0/lun0-W-OMCoT2A9Iw:write_align_histo.0:72%
lun:/vol/iscsivol0/lun0-W-OMCoT2A9Iw:write_align_histo.1:0%
lun:/vol/iscsivol0/lun0-W-OMCoT2A9Iw:write_align_histo.2:0%
lun:/vol/iscsivol0/lun0-W-OMCoT2A9Iw:write_align_histo.3:0%
lun:/vol/iscsivol0/lun0-W-OMCoT2A9Iw:write_align_histo.4:0%
lun:/vol/iscsivol0/lun0-W-OMCoT2A9Iw:write_align_histo.5:0%
lun:/vol/iscsivol0/lun0-W-OMCoT2A9Iw:write_align_histo.6:0%
lun:/vol/iscsivol0/lun0-W-OMCoT2A9Iw:write_align_histo.7:0%
lun:/vol/iscsivol0/lun0-W-OMCoT2A9Iw:read_partial_blocks:0%
lun:/vol/iscsivol0/lun0-W-OMCoT2A9Iw:write_partial_blocks:27%

Should I be worried about this? All of my vmdks are aligned (i went through every single one to double check), plus there are no writes to .1-.7 so the evidence also shows no alignment issue. I submitted a case to have an engineer verify there is no issue, but I was wondering if anyone else has seen partial writes like this on a VMware iSCSI lun. The partial writes typically hover around 30%, but can vary. They were at 75% at one point today.

This is odd....

Are you using flexclone or ASIS or anything like that? When you allocated the LUN, which LUN type did you set?

# ? Dec 24, 2009 18:42

H110Hawk: Dec 28, 2006

three posted:

After discussion, I think it was more of that my manager thought it would work... and it likely won't; I don't believe the Dell rep ever said specifically it would work with Oracle.

He's not upset about it, and it isn't that big of a deal...

Cool. Those sorts of things have a tendency to trickle down in corporate culture with bad results for the ones at the bottom of the hill.

# ? Dec 24, 2009 19:47

adorai: Nov 2, 2002; 10/27/04 Never forget; Grimey Drawer

1000101 posted:

This is odd....

Are you using flexclone or ASIS or anything like that? When you allocated the LUN, which LUN type did you set?

ASIs yes, flexclone no, esx lun type

# ? Dec 25, 2009 06:51

namaste friends: Sep 18, 2004; by Smythe

adorai posted:

On our NetApp 3140, when running the command: priv set diag; stats show lun; priv set admin

I am seeing a large percentage of partial writes. There does not seem to be a corresponding number of misaligned writes, as you can see in the below output:

lun:/vol/iscsivol0/lun0-W-OMCoT2A9Iw:write_align_histo.0:72%
lun:/vol/iscsivol0/lun0-W-OMCoT2A9Iw:write_align_histo.1:0%
lun:/vol/iscsivol0/lun0-W-OMCoT2A9Iw:write_align_histo.2:0%
lun:/vol/iscsivol0/lun0-W-OMCoT2A9Iw:write_align_histo.3:0%
lun:/vol/iscsivol0/lun0-W-OMCoT2A9Iw:write_align_histo.4:0%
lun:/vol/iscsivol0/lun0-W-OMCoT2A9Iw:write_align_histo.5:0%
lun:/vol/iscsivol0/lun0-W-OMCoT2A9Iw:write_align_histo.6:0%
lun:/vol/iscsivol0/lun0-W-OMCoT2A9Iw:write_align_histo.7:0%
lun:/vol/iscsivol0/lun0-W-OMCoT2A9Iw:read_partial_blocks:0%
lun:/vol/iscsivol0/lun0-W-OMCoT2A9Iw:write_partial_blocks:27%

Should I be worried about this? All of my vmdks are aligned (i went through every single one to double check), plus there are no writes to .1-.7 so the evidence also shows no alignment issue. I submitted a case to have an engineer verify there is no issue, but I was wondering if anyone else has seen partial writes like this on a VMware iSCSI lun. The partial writes typically hover around 30%, but can vary. They were at 75% at one point today.

Are you using Operations Manager? If so, can you create a graph for iSCSI latency? That should tell us how well your iSCSI SAN is performing. When you created your iSCSI LUNs, did you use snapdrive or did you create them manually? If you created them manually did you remember to align the VMFS formatting? Also, do you have the ESX host utilities kit installed?

# ? Jan 3, 2010 07:05

adorai: Nov 2, 2002; 10/27/04 Never forget; Grimey Drawer

Cultural Imperial posted:

Are you using Operations Manager? If so, can you create a graph for iSCSI latency? That should tell us how well your iSCSI SAN is performing. When you created your iSCSI LUNs, did you use snapdrive or did you create them manually? If you created them manually did you remember to align the VMFS formatting? Also, do you have the ESX host utilities kit installed?

NetApp support got back to me, and they said it's nothing to worry about.

To answer your questions, I could create the graph, but am pretty lazy. As far as lun creation, it was all done with snapdrive or by selecting the proper lun type when creating it. And I do have the host utilities installed.

# ? Jan 4, 2010 00:41

zapateria: Feb 16, 2003

Our company has two office locations and we're planning to use the second as a disaster recovery location.

Our primary location has the following gear:

4 HP BL460c G1 blades running ESX3.5
6 HP BL460c G1 blades running Win2003 with Oracle/MSSQL
1 HP EVA4400 with about 17TB storage, 15TB in use

We're probably looking at 1 or 2 days of acceptable downtime before we have things up and running at the secondary location so for the physical servers we'll just order new hardware and restore backups in case of a disaster.

First step is to set up one or two ESXi hosts with a storage system and transfer backups of our VMs from the primary location. We have a gigabit WAN link between the locations.

What kind of storage system would be suitable as a cheap and offline kind of solution at the secondary location to take over maybe 25 VMs with stuff like domain controllers, print servers, etc.

# ? Jan 8, 2010 11:36

1000101: May 14, 2003; BIRTHDAY BIRTHDAY BIRTHDAY BIRTHDAY BIRTHDAY BIRTHDAY FRUITCAKE!

Lots of local storage? If you've got no specific need to replicate and you're okay running in diminished capacity for a short term and you can't get budget for another HP EVA it may not be worth building another SAN off site.

Does that EVA support data replication? If it does and you buy something that its not compatible with today you might be shooting yourself in the foot a year from now.

Optionally if thats not a big deal, then iSCSI with LeftHand might be appropriate.

# ? Jan 9, 2010 22:38

Nomex: Jul 17, 2002; Flame retarded.

zapateria posted:

Our company has two office locations and we're planning to use the second as a disaster recovery location.

Our primary location has the following gear:

4 HP BL460c G1 blades running ESX3.5
6 HP BL460c G1 blades running Win2003 with Oracle/MSSQL
1 HP EVA4400 with about 17TB storage, 15TB in use

We're probably looking at 1 or 2 days of acceptable downtime before we have things up and running at the secondary location so for the physical servers we'll just order new hardware and restore backups in case of a disaster.

First step is to set up one or two ESXi hosts with a storage system and transfer backups of our VMs from the primary location. We have a gigabit WAN link between the locations.

What kind of storage system would be suitable as a cheap and offline kind of solution at the secondary location to take over maybe 25 VMs with stuff like domain controllers, print servers, etc.

Ideally, I would say get a second EVA 4400 with enough FATA disks to cover your storage needs, then get 2 Brocade 7500 SAN extension switches. You can then pick up a Continuous Access EVA license and enable EVA asynchronous replication between your primary and DR sites. This will have 0 downtime costs. You can plug all the required equipment in and configure it all while live. This won't be the cheapest option unfortunately, but it will be the best.

Also, you probably won't have to size the EVA to be as large as your primary storage, as a lot of your disk is probably carved into RAID 10. You can set all the DR stuff to RAID 5 and sacrifice performance for space.

Nomex fucked around with this message at 00:37 on Jan 10, 2010

# ? Jan 10, 2010 00:32

adorai: Nov 2, 2002; 10/27/04 Never forget; Grimey Drawer

Nomex posted:

sacrifice performance for space.

Our secondary site has enough storage to hold all of the data, and just enough performance for our critical apps to keep running, so we have all sata disk on a single controller at our DR site. We are comfortable with letting our non critical apps be down so long as the data is intact.

# ? Jan 10, 2010 01:13

Nomex: Jul 17, 2002; Flame retarded.

adorai posted:

Our secondary site has enough storage to hold all of the data, and just enough performance for our critical apps to keep running, so we have all sata disk on a single controller at our DR site. We are comfortable with letting our non critical apps be down so long as the data is intact.

In that case, you can get an EVA starter kit for pretty cheap. Call your HP rep and have him quote you on model #AJ700B. That's the part number for the 4TB (10 x 400 10k FC) model. If that's not a good fit there's a few more options Here. The starter kits tend to be a lot cheaper than just buying an EVA.

I forgot to mention, if you do decide to go this route, DO NOT under any circumstances let anyone talk you into using HP MPX110 IP distance gateways. They're complete poo poo.

Nomex fucked around with this message at 06:35 on Jan 10, 2010

# ? Jan 10, 2010 06:33

Section 9: Mar 24, 2003; Hair Elf

This probably straddles the line between this thread and the virtualization thread, but I guess it's more on topic for storage.

I'm in the process of doing some storage migrations and was hoping that I might be able to get some helpful advice (or people screaming about what I am doing horribly wrong.) I would still say I'm a bit new to SAN/NAS stuff, I get the basic ideas and how to set things but, but I know almost nothing about optimization for performance.

I'm moving a bunch of VMware VMs from our existing SAN (CX3-40) to a SunFire x4540. In both cases (the EMC and the Sun) I have two incoming 4GB fibre channel HBA ports, so they should have the same bandwidth in that respect.

On the Sun I have set up one big zpool with 4 11 disk raidz2 groups and 3 hotspares. Out of that I have carved a couple of 1TB LUNs on which I have put the VMFS datastores. So I guess it has 44 spindles, whereas each of the LUNs on our EMC is in a raid group of about 7 drives. Both are using 7200 rpm drives. So my understanding is that the Sun should have the speed benefit here because it has more spindles assigned to each LUN.

I started migrating some of the VMs to the new Sun datastores and noticed that they seemed to be running slower, so I figured there must be some issue with how I have the new storage configured. I hadn't had much chance to do testing or research the best practices because we had been in a big time crunch to get this done quickly, but the pressure has backed off and now I have the opportunity to go back and do things properly.

I'm still in the midst of doing some testing (and probably doing it wrong), but so far I seem to be getting about 1/10th the IOPs from the Sun datastores as I am from the EMC datastores.
For testing I have a Windows VM set up with IOMeter. I added a 100GB VMDK to it from the EMC side and ran a default test for about 8 hours and got about 65 IOPs average. Then I replace the 100GB VMDK with the same but on one of the Sun datastores and get about 8 IOPs average.

My preference is to try to configure my storage to maximize usable space first, with performance second. I'm not to terribly concerned about having as much protection from massive disk failure as this is all test/dev stuff that is backed up frequently. I don't need to push the envelope on IOPs, but I'd like a VM to get roughly the equivalent of a single SATA drive.

A couple of things that I've considered:
1) Smaller datastore LUNs. I've heard conflicting stories over the years on whether the size of the datastore matters. I'm going to do some testing with smaller datastores (all of the datastores on the EMC are about 400GB average instead of 1TB.)
2) Another storage box. We've got another x4540 on the way, so possibly splitting the datastores between the two will help spread out the load a bit? Also, when it arrives I can always give it a radically different zfs configuration if my 4x11raidz2 idea is monumentally stupid.
3) Misalignment of the LUNs. I've read, here and elsewhere, about aligning the LUNs and VMFS/VMDK partitions but I'm a bit confused about when it matters. I know we never did any alignment on the EMC.

Does anyone have any advice/comments/disgust about any of this? Anything else I should be investigating or trying out, articles to read, etc? I'm busily rooting through all the documentation I can find on the subject hoping to get a much better understanding of all of this so that I can get things done correctly now that I finally have a chance.

# ? Jan 13, 2010 17:52

complex: Sep 16, 2003

For the x4540 you should use raidz2 groups of 6 disks for optimal performance. This is becuase the x4540 has 6 SATA controllers. See http://blogs.sun.com/timthomas/entry/recipe_for_a_zfs_raid and/or http://www.solarisinternals.com/wiki/index.php/ZFS_Configuration_Guide#How_to_Set_Up_ZFS_on_an_x4500_System for details.

Comparing the x4540 and the CX3 head-to-head based purely on spindle count is not a good comparison. The CX3 has a bunch of cache that can really help performance.

# ? Jan 13, 2010 20:11

adorai: Nov 2, 2002; 10/27/04 Never forget; Grimey Drawer

you need logzillas and readzillas to get any kind of performance out of the sun gear.

yzgi posted:

3) Misalignment of the LUNs. I've read, here and elsewhere, about aligning the LUNs and VMFS/VMDK partitions but I'm a bit confused about when it matters. I know we never did any alignment on the EMC.

You should align both the VMFS volume and the VMDKs. Otherwise, a single 4k write could require 2 reads and 3 writes, instead of just one write. By aligning all of your data you will likely see a 10% to 50% performance improvement.

edit: about the alignment. Here is your unaligned data:

code:

VMDK                    -------111111112222222233333333
VMFS             -------11111111222222223333333344444444
SAN       -------1111111122222222333333334444444455555555

Each set of numbers is a 4k block, and each ------- is the final 3.5k of your 63 sector (31.5k) offset. Notice how to write the block of 2s at the VMDK level, you have to write to both the 2s and the 3s of the VMFS level, which will require you to write to the 2s, 3s, and 4s at the SAN level. More importantly, a partial write requires you to read the existing data first. The problem is amplified at the single block level, with larger datasets the impact is less dramatic however it still exists. Here is what it would look like with an extra 512 bytes (aligned at the 32k boundary):

code:

VMDK                      ------- 111111112222222233333333
VMFS              ------- 11111111222222223333333344444444
SAN       ------- 1111111122222222333333334444444455555555

A write of the 2s at the VMDK level requires you to write the 3s of the VMFS level, which only requires a write of the 4s at the SAN level, and does not require ANY reads of the SAN.

adorai fucked around with this message at 00:23 on Jan 14, 2010

# ? Jan 14, 2010 00:09

1000101: May 14, 2003; BIRTHDAY BIRTHDAY BIRTHDAY BIRTHDAY BIRTHDAY BIRTHDAY FRUITCAKE!

adorai posted:

you need logzillas and readzillas to get any kind of performance out of the sun gear.

You should align both the VMFS volume and the VMDKs. Otherwise, a single 4k write could require 2 reads and 3 writes, instead of just one write. By aligning all of your data you will likely see a 10% to 50% performance improvement.

edit: about the alignment. Here is your unaligned data:
code:
VMDK                    -------111111112222222233333333
VMFS             -------11111111222222223333333344444444
SAN       -------1111111122222222333333334444444455555555 
Each set of numbers is a 4k block, and each ------- is the final 3.5k of your 63 sector (31.5k) offset. Notice how to write the block of 2s at the VMDK level, you have to write to both the 2s and the 3s of the VMFS level, which will require you to write to the 2s, 3s, and 4s at the SAN level. More importantly, a partial write requires you to read the existing data first. The problem is amplified at the single block level, with larger datasets the impact is less dramatic however it still exists. Here is what it would look like with an extra 512 bytes (aligned at the 32k boundary):
code:
VMDK                      ------- 111111112222222233333333
VMFS              ------- 11111111222222223333333344444444
SAN       ------- 1111111122222222333333334444444455555555 
A write of the 2s at the VMDK level requires you to write the 3s of the VMFS level, which only requires a write of the 4s at the SAN level, and does not require ANY reads of the SAN.

I'm stealing your 1's 2's and 3's diagram to help explain this concept to customers. It communicates this very well.

# ? Jan 14, 2010 00:33

Adbot: ADBOT LOVES YOU

# ? Apr 24, 2024 13:36

Section 9: Mar 24, 2003; Hair Elf

adorai posted:

You should align both the VMFS volume and the VMDKs. Otherwise, a single 4k write could require 2 reads and 3 writes, instead of just one write. By aligning all of your data you will likely see a 10% to 50% performance improvement.

Thanks for this and the other advice! This helped a lot with getting my head around some things I didn't quite understand.

I rebuilt the x4540 from the ground up with a better zpool configuration and using aligned volumes and now everything is running at about the same speed as our old SAN (which actually probably could be a lot faster if it had been configured properly in the first place as well.) I spent the weekend migrating about 12TB of VMs and data volumes over and it all went a lot faster than it had been going with the old configuration!

# ? Jan 17, 2010 23:30

The Something Awful Forums > Discussion > Serious Hardware/Software Crap > Enterprise Storage Megathread: Why is my NAS a SAN?

«‹›207 »