Enterprise Storage Megathread: Why is my NAS a SAN?

The Something Awful Forums > Discussion > Serious Hardware/Software Crap > Enterprise Storage Megathread: Why is my NAS a SAN?

«‹›206 »

YOLOsubmarine: Oct 19, 2004; When asked which Pokemon he evolved into, Kamara pauses.

"Motherfucking, what's that big dragon shit? That orange motherfucker. Charizard."

evil_bunnY posted:

Why was it disallowed? I got quoted one just a month ago, and a the netapp website still lists a pair of 512GB cards as supported config, with the little gotcha of not being able to run 8.1

You can hit some pretty serious issues on any version of DOT8 using PAM, though the issue effects 8.1 systems more often than 8.0.x systems. However using any DOT 8 version on a 3210 with PAM has the potential to cause serious issues for which the only fix is disabling the PAM cards.

The NetApp site likely doesn't mention that 8.0.x isn't supported because systems were shipped running that config before the bug was discovered, as marketingman mentioned above.

If you worked with a partner who has tried to pitch that config since November then they were not following proper guidance.

# ? Mar 14, 2012 21:29

Adbot: ADBOT LOVES YOU

# ? Apr 18, 2024 23:53

Intraveinous: Oct 2, 2001; Legion of Rainy-Day Buddhists

Wow, I'm really glad I didn't end up with the 3210 and PAM that I was quoted last year about this time. I hadn't heard about this issue until now, but it makes my decision feel even better.

# ? Mar 14, 2012 22:40

Vulture Culture: Jul 14, 2003; I was never enjoying it. I only eat it for the nutrients.

Alctel posted:

Hey

So we just moved to ESXi 5 and I am taking this oppertunity to redo all our data stores and LUNs, because they are horrible and messy.

Anyone know any good resources for Array/LUN sizing with VMWare or any suggestions?

We have a IBM DS3524 with 8SAS drives (2.8TB useable) and 8 SATA (4.5TB useable), with around 20 VMS on two hosts.

On another note I was looking at the new storage DRS stuff and I think I poo poo myself with excitment

Array sizing will depend on your IOPS and budgetary requirements. LUN sizing within an array isn't relevant for volumes with lightly-used VMs, for the most part, at least from a performance perspective. The only major consideration with LUN sizing is that you have a separate HBA I/O queue for each LUN, so if you're regularly saturating your queues (and your backend isn't oversubscribed) then you need to break your LUNs into smaller LUNs to leverage multiple queues. Also note that your storage doesn't support VAAI, which means you're relying on SCSI-2 reservations for locking -- you can only have one cluster metadata operation (powering on/off a VM, growing a thin-provisioned disk) at a time on a LUN. If you're doing a lot of thin provisioning, consider sticking no more than a dozen VMs on a LUN.

Beyond that, you're in regular storage territory. Don't mix sequential and random workloads on the same array. Don't mix concurrently-running sequential workloads on the same array (enough concurrent I/O to the same spindles essentially interleaves your sequential workloads until they turn random). Make sure your cache is tuned correctly for your workload. Size and stripe your arrays correctly. Use the right RAID level. Align your partitions.

# ? Mar 14, 2012 23:57

Muslim Wookie: Jul 6, 2005

NippleFloss posted:

You can hit some pretty serious issues on any version of DOT8 using PAM, though the issue effects 8.1 systems more often than 8.0.x systems. However using any DOT 8 version on a 3210 with PAM has the potential to cause serious issues for which the only fix is disabling the PAM cards.

The NetApp site likely doesn't mention that 8.0.x isn't supported because systems were shipped running that config before the bug was discovered, as marketingman mentioned above.

If you worked with a partner who has tried to pitch that config since November then they were not following proper guidance.

It's pretty irritating having customers with 3210s and PAM out there, and is why I expect NetApp will fix this.

I'm not sure how people are still getting quoted on this, as the options have literally been removed from quoting tools, but that may have been more recent than "a month ago". Though I don't think so.

FYI - this is obviously anecdotal, but I have quite a few heavily loaded to maxed out 3210s with PAM in the field, for more than 6 months at least, and am yet to have an issue.

Also, guys if you've had a 3210 with PAM quote, you should ask about the next model up with PAM, NetApp, at least in my neck of the woods, have been doing *very* good pricing (ie the same as 3210s) because of the issue.

# ? Mar 15, 2012 04:18

optikalus: Apr 17, 2008

Shot in the dark, but we just deployed a NetApp 2240-4 (ontap 8.1) to our office, and while it works perfectly for all Windows clients, our OSX Lion clients are extremely slow. The volumes are set up as NTFS and we've got it connected to our AD via CIFS.

Default OSX clients take several minutes to display folders on the share, and copying takes an eternity. It is pretty much unusable for our power-users. These problems don't exist on the Windows side.

We've tried all the various tricks people suggest for OSX and SMB (Apple ditched Samba for 10.7 and wrote their own since they didn't like the license apparently).

The trick is that installing DAVE on OSX makes it work perfectly. The problem is that DAVE is going to run about $100/workstation/year. It may just be what we have to do. I guess we could mount them to a cheap Windows server and re-export the shares as well.

# ? Mar 15, 2012 05:40

YOLOsubmarine: Oct 19, 2004; When asked which Pokemon he evolved into, Kamara pauses.

"Motherfucking, what's that big dragon shit? That orange motherfucker. Charizard."

optikalus posted:

Shot in the dark, but we just deployed a NetApp 2240-4 (ontap 8.1) to our office, and while it works perfectly for all Windows clients, our OSX Lion clients are extremely slow. The volumes are set up as NTFS and we've got it connected to our AD via CIFS.

Default OSX clients take several minutes to display folders on the share, and copying takes an eternity. It is pretty much unusable for our power-users. These problems don't exist on the Windows side.

We've tried all the various tricks people suggest for OSX and SMB (Apple ditched Samba for 10.7 and wrote their own since they didn't like the license apparently).

The trick is that installing DAVE on OSX makes it work perfectly. The problem is that DAVE is going to run about $100/workstation/year. It may just be what we have to do. I guess we could mount them to a cheap Windows server and re-export the shares as well.

Setting "options cifs.show_snapshot off" sometimes helps with directory browsing on OSX clients, but shouldn't make any difference for transfer speeds. What kind of throughput and latencies are you seeing on transfers? A lot of times if the issue is tough to pin down getting a packet trace is the best way to find the culprit. You can do this from the filer with the pktt command.

If NFS is licensed and you have a suitable directory service to use for mapping you could also consider doing multi-protocol and letting your OSX clients connect via NFS.

# ? Mar 15, 2012 07:59

optikalus: Apr 17, 2008

NippleFloss posted:

Setting "options cifs.show_snapshot off" sometimes helps with directory browsing on OSX clients, but shouldn't make any difference for transfer speeds. What kind of throughput and latencies are you seeing on transfers? A lot of times if the issue is tough to pin down getting a packet trace is the best way to find the culprit. You can do this from the filer with the pktt command.

If NFS is licensed and you have a suitable directory service to use for mapping you could also consider doing multi-protocol and letting your OSX clients connect via NFS.

cifs.snow_snapshot is already set to off. I just ran a simple test to attempt to copy a 5GB ISO from one of the shares on my iMac running Lion and it got about 13MB in 38 minutes (~6k/s). I ran pktt start e0a -i <myip> during this and ran pktt dump after a few minutes. It captured 6 packets.. all ICMP from my workstation to the filer. Fired up Parallels running Windows 7 with a bridged ethernet interface and copied the same file to completion at about 40MB/s.

NFS is also licensed but we don't have any nis/yp on the osx hosts. I wouldn't be worried about permissions on the main shares (everything should be wide open on those) just the home shares and we could always add a new user on the filer for osx home share nfs exports.

# ? Mar 15, 2012 22:53

Nomex: Jul 17, 2002; Flame retarded.

I wrote this post as a response to Alctel, until I realized his array wasn't an N series, but I think someone might find it helpful, so I'm leaving it. Here's some recommendations for configuring VMware on Netapp appliances:

First of all, use NFS. LUN snapshots on Netapp products is poo poo. You have to mount the snapshot as a LUN to pull stuff out of it. With NFS you can just copy and paste files back. VSphere also doesn't report the size of de-duplicated volumes on fiber channel properly. It only shows how much un-deduplicated space is used in the data store. With NFS it's just a network drive, so it reports whatever the array tells it. There's a bunch more reasons, but this post is getting long already.

For sizing, you'll want to thin provision a VMware volume obviously, but you'll want to put only the OS partitions in there. Take all your servers and figure out how big their OS volumes are, then add maybe 10 or 20%. It depends on how much data you have. Keep in mind you'll need to save ~20% of the array space for snapshots. You'll want to keep all your data drives as either network shares or raw device mappings. The reason you want to do this is because with only OS volumes in the VMware volume, you'll get an awesome dedupe ratio. For the data, you don't want to slow down access by slipping VMFS between the server and the storage. Using RDMs also makes sure all your array features will work properly.

Nomex fucked around with this message at 00:14 on Mar 16, 2012

# ? Mar 15, 2012 23:59

madsushi: Apr 19, 2009; Baller.
#essereFerrari

Nomex posted:

You'll want to keep all your data drives as either network shares or raw device mappings. The reason you want to do this is because with only OS volumes in the VMware volume, you'll get an awesome dedupe ratio. For the data, you don't want to slow down access by slipping VMFS between the server and the storage. Using RDMs also makes sure all your array features will work properly.

I disagree with this.

Inflating your dedupe ratio by stacking only OS drives into one volume is bad for your overall dedupe amount. You get the BEST dedupe results (total number of GBs saved) by stacking as MUCH data into a single volume as possible. The ideal design would be a single, huge volume with all of your data in it with dedupe on.

Also, re: slowing down by slipping VMFS in the middle, this is wrong, because there is no VMFS on an NFS share. You're better off using iSCSI with SnapDrive to your NetApp LUNs, rather than doing RDM.

# ? Mar 16, 2012 01:50

YOLOsubmarine: Oct 19, 2004; When asked which Pokemon he evolved into, Kamara pauses.

"Motherfucking, what's that big dragon shit? That orange motherfucker. Charizard."

optikalus posted:

cifs.snow_snapshot is already set to off. I just ran a simple test to attempt to copy a 5GB ISO from one of the shares on my iMac running Lion and it got about 13MB in 38 minutes (~6k/s). I ran pktt start e0a -i <myip> during this and ran pktt dump after a few minutes. It captured 6 packets.. all ICMP from my workstation to the filer. Fired up Parallels running Windows 7 with a bridged ethernet interface and copied the same file to completion at about 40MB/s.

NFS is also licensed but we don't have any nis/yp on the osx hosts. I wouldn't be worried about permissions on the main shares (everything should be wide open on those) just the home shares and we could always add a new user on the filer for osx home share nfs exports.

You don't need NIS or YP to do user-mapping. You can use an LDAP backend (AD works if you have the unix attributes enabled and populated) or the local /etc/passwd file on the NetApp to manage your unix users. As long as you can associate a particular unix UID with a domain username through LDAP or the usermap file then you permission mapping would work fine.

Regarding the packet dump, it sounds like you've got some missing data. I'd rerun pktt without limiting it to a specific host and then copy the dump file off to your PC and peruse it with wireshark. Wireshark's filters are much more usable and it does a great job of color coding things like retransmits or malformed packets.

If all else fails, open a support case. If nothing else they should be able to tell you what the issue is with Apple's SMB stack so you can request a patch.

madsushi posted:

Inflating your dedupe ratio by stacking only OS drives into one volume is bad for your overall dedupe amount. You get the BEST dedupe results (total number of GBs saved) by stacking as MUCH data into a single volume as possible. The ideal design would be a single, huge volume with all of your data in it with dedupe on.

I agree, with some caveats. SIS enabled volumes have size limits that vary based on platform and OnTap version. If you can fit all of your data under that limit then great, but if not then it makes sense to try and consolidate the most similar data onto dedicated datastores that are less than that limit so that you get the best ratios within each volume.

The other issue is that deduping very large volumes on a busy filer can take a LONG time. If you want to run your dedupe scans outside of business hours, which is a good practice, then you need to make sure you can actually dedupe your whole volume within your window. That may require creating smaller volumes where, again, you would want to consolidate like data.

I do agree that there aren't very many compelling reasons to use RDMs that don't involve a very specific application requirement such as clustering. If you need to split OS and APP data you can put your OS drives on one datastore and APP data on another and only dedupe the OS data and accomplish the same thing with more flexibility and ease of use. And if you absolutely require a block protocol iSCSI within the guest (assuming iSCSI is licensed) is the way most NetApp people seem to go.

# ? Mar 16, 2012 03:01

Nomex: Jul 17, 2002; Flame retarded.

madsushi posted:

Inflating your dedupe ratio by stacking only OS drives into one volume is bad for your overall dedupe amount. You get the BEST dedupe results (total number of GBs saved) by stacking as MUCH data into a single volume as possible. The ideal design would be a single, huge volume with all of your data in it with dedupe on.

You get the best results with dedupe by stacking a lot of similar data together. If you have a ton of random data, your dedupe ratio will be crap. If you stack 100 VMs, you're going to get a huge savings, as they all have nearly identical files.

quote:

Also, re: slowing down by slipping VMFS in the middle, this is wrong, because there is no VMFS on an NFS share. You're better off using iSCSI with SnapDrive to your NetApp LUNs, rather than doing RDM.

This is how you mount an RDM. You're right that NFS shares aren't affected, that's why I mentioned using either network shares or RDMs. Whether you use FCP or iSCSI to mount a data store, you have to format it with VMFS. If you connect directly to the LUN (as you mentioned), you're using an RDM.

Nomex fucked around with this message at 07:50 on Mar 16, 2012

# ? Mar 16, 2012 07:48

Stoo: Dec 29, 2004; SHART MY JORTS

420 shart jorts everyday

Anyone run HP's 3PAR systems? Am looking at it for our new datacentre to use as vmfs datastore primarily for a large esx cluster and am curious if anyone has any real-world experience using them to share?

# ? Mar 16, 2012 08:36

Muslim Wookie: Jul 6, 2005

Nomex posted:

You get the best results with dedupe by stacking a lot of similar data together. If you have a ton of random data, your dedupe ratio will be crap. If you stack 100 VMs, you're going to get a huge savings, as they all have nearly identical files.

Maybe I have some fundamental misunderstanding here, but if a volume says 50% saved and another says 90% saved, it doesn't matter, it's meaningless. What matters is total saved - the filer is doing block level dedupe, by putting data in with the OS vmdks, you aren't making the filer dedupe worse - sure it has a worse percentage but who cares?

# ? Mar 16, 2012 08:46

YOLOsubmarine: Oct 19, 2004; When asked which Pokemon he evolved into, Kamara pauses.

"Motherfucking, what's that big dragon shit? That orange motherfucker. Charizard."

Nomex posted:

You get the best results with dedupe by stacking a lot of similar data together. If you have a ton of random data, your dedupe ratio will be crap. If you stack 100 VMs, you're going to get a huge savings, as they all have nearly identical files.

This is how you mount an RDM. You're right that NFS shares aren't affected, that's why I mentioned using either network shares or RDMs. Whether you use FCP or iSCSI to mount a data store, you have to format it with VMFS. If you connect directly to the LUN (as you mentioned), you're using an RDM.

Why do you care about your dedupe ratio? If you have a volume with a 100GB of data that you dedupe down to 10G that's a great ratio and all, but if you have a volume with 1T of data that dedupes down to 500G you're saving a hell of a lot more space even if the ratio isn't nearly as good. Barring mitigating factors dedupe works better the more data you include in the volume, whether that data is similar or not. All of your similar data in the volume will still dedupe just as well, but a lot of that dissimilar data will also see some savings given that the dedupe is block level and even incredibly dissimilar data sets can share common blocks.

Regarding RDMs, I'm wondering why you're using them at all. You've just talked about the wonders of NFS from a manageability perspective, so what's stopping you from using VMDKs on NFS volumes instead of RDMs for data drives? You made it seem as if that wasn't an option when it is by far the most common thing in VMware deployments on NFS. If block level access from a guest is required and a VMDK won't cut it (MSCS or some Snapmanager products) then you can always mount an iSCSI LUN from within the guest OS and avoid RDMs entirely.

RDMs are pretty kludgy and basically only exist because VMware needed to provide some way for clustered services to function in a virtualized environment. No reason to put data on them that doesn't specifically require it.

# ? Mar 16, 2012 08:48

Muslim Wookie: Jul 6, 2005

NippleFloss, I thought I was going a bit crazy there, thinking I must have missed some basic concept.

Also, I've noticed a lot of people talking about using RDMs and LUNs as opposed to VMDKs on NFS because of "response time", latency etc. Usually this is because a) they've seen some PowerPoint slides showing those things are a bees dick faster, and b) everyone seems to think they're company is some sort of computing powerhouse.

Just been seeing a huge lack of perspective when it comes to storage lately.

# ? Mar 16, 2012 08:53

Rhymenoserous: May 23, 2008

madsushi posted:

Inflating your dedupe ratio by stacking only OS drives into one volume is bad for your overall dedupe amount. You get the BEST dedupe results (total number of GBs saved) by stacking as MUCH data into a single volume as possible. The ideal design would be a single, huge volume with all of your data in it with dedupe on.

EFB

Several times...

This is also very dependent on the device. There I contributed.

# ? Mar 16, 2012 14:19

Nomex: Jul 17, 2002; Flame retarded.

NippleFloss posted:

Why do you care about your dedupe ratio? If you have a volume with a 100GB of data that you dedupe down to 10G that's a great ratio and all, but if you have a volume with 1T of data that dedupes down to 500G you're saving a hell of a lot more space even if the ratio isn't nearly as good. Barring mitigating factors dedupe works better the more data you include in the volume, whether that data is similar or not. All of your similar data in the volume will still dedupe just as well, but a lot of that dissimilar data will also see some savings given that the dedupe is block level and even incredibly dissimilar data sets can share common blocks.

Regarding RDMs, I'm wondering why you're using them at all. You've just talked about the wonders of NFS from a manageability perspective, so what's stopping you from using VMDKs on NFS volumes instead of RDMs for data drives? You made it seem as if that wasn't an option when it is by far the most common thing in VMware deployments on NFS. If block level access from a guest is required and a VMDK won't cut it (MSCS or some Snapmanager products) then you can always mount an iSCSI LUN from within the guest OS and avoid RDMs entirely.

RDMs are pretty kludgy and basically only exist because VMware needed to provide some way for clustered services to function in a virtualized environment. No reason to put data on them that doesn't specifically require it.

You're right. I made a mistake. We're currently running FC datastores with RDMs, but we're switching to NFS. With FC it makes sense, not so much with NFS.

# ? Mar 16, 2012 14:38

madsushi: Apr 19, 2009; Baller.
#essereFerrari

Whenever I'm talking to a client about NetApp, I like to say there's only 3 types of data: CIFS/SMB/Files, VMWare/virtualization, and databases.

In the perfect NetApp design, you have a big volume containing all of your organization's aggregated CIFS/SMB/file shares, and dedupe is saving you tons of space and your snapshots are seamlessly integrated into Windows' "Previous Versions" tab. NetApp automatic scheduled snapshots handle backups.

You store all of your VMWare data, save for a small, thin-provisioned vSwap volume, in one big volume. Dedupe saves you space on the OS drives and on any applications that are commonly installed (AV, etc). You mount this volume via NFS, and so any VMs you create simply get tossed into that NFS volume. You use the NetApp Virtual Storage Console (VSC) to set best-practices settings on your VMWare hosts, to provision/connect the storage, and to take snapshots on the NetApp regularly (and obviously for recovery).

Finally, you put all of your databases into individual LUNs in individual qtrees in individual volumes: one for the database files, one for the logs, and one for SnapInfo information (depending on your config). This goes for Exchange, SQL, Oracle, etc, any of the database platforms. You connect these LUNS directly to the guest OS via SnapDrive/iSCSI, and you manage the configuration and backups via SnapManager. Dedupe doesn't necessarily need to be turned on for these volumes, and you definitely want ONLY the databases / log files in these volumes. These are your high-usage volumes, you want them to perform.

Database example:
/vol/ExchangeDB/qtree/exchangedb.lun
/vol/ExchangeLogs/qtree/exchangelogs.lun
/vol/ExchangeSnapInfo/qtree/exchangesnapinfo.lun

Now, you have a very consistent best-practices NetApp setup with very easy backup/recovery options.

Catch: as Nomex correctly mentioned above, if you're running your filer on fiber channel, you can't pass LUNs to your guest OS without using RDM. Luckily most the NetApp installations I work on are all network-based (iSCSI/NFS).

madsushi fucked around with this message at 16:32 on Mar 16, 2012

# ? Mar 16, 2012 16:29

Muslim Wookie: Jul 6, 2005

ZombieRegan, sorry it didn't work out for you at the time, though you got it sorted. I can tell you that I definately would never have deployed in the fashion it was deployed at your operation. I won't detail the changes, because for the most part, madsushi is right on the money!

Depending on the situation I sometimes do a staged deployment where I go for big bucket, measure and test and break things down gradually, or vice versa.

madsushi posted:

Catch: as Nomex correctly mentioned above, if you're running your filer on fiber channel, you can't pass LUNs to your guest OS without using RDM. Luckily most the NetApp installations I work on are all network-based (iSCSI/NFS).

This is not true. You can absolutely mount a LUN over FC using, for example, the iSCSI initiator in Windows.

I literally just did it right now. And now I've unmounted it, and mounted it via ethernet. Now I've taken a snapshot of the LUN and mounted that snapshot to via FC.

Hyuk spazzed out and forgot we were talking about RDMs meaning VMware. I'll let my shame stand.

# ? Mar 16, 2012 17:20

Dilbert As FUCK: Sep 8, 2007; by Cowcaster; Pillbug

A lot of our clients are wanting cloud solutions for things now, so I am trying to piece together a "cloud" storage solution. I was thinking

1)Lease out a NAS device to clients
2)NAS Device holds local backups for client for faster redeploys
here is where I a a bit torn

A)Rsync/Bacula to our "private cloud" during the night, compress further 7z => to our cloud server
Pro's
more redundancy
getting data to Client if NAS fails and backups are needed is faster our clients could get a full backup by us driving over to them with an external of backup from the past two weeks
Con's
More steps

B)NAS device runs Bacula=> on stie compress=>upload to cloud
Pro's
We don't clog any of our WAN, backup depends on clients connection
We don't have to use any storage
Con's
Slowish

Kinda torn between these two. I prefer A more offer the customer, but B is more simplistic.

Plan A would be 55c/GB + 50/mo + ~150 for NAS setup/deploy

Plan B would be $10/mo per PC(no charge per/GB) + 50/mo + ~150 NAS setup deploy

Thoughts? Which would you all choose given the option?

Dilbert As FUCK fucked around with this message at 20:07 on Mar 16, 2012

# ? Mar 16, 2012 18:58

Vulture Culture: Jul 14, 2003; I was never enjoying it. I only eat it for the nutrients.

Corvettefisher posted:

A lot of our clients are wanting cloud solutions for things now, so I am trying to piece together a "cloud" storage solution. I was thinking

1)Lease out a NAS device to clients
2)NAS Device holds local backups for client for faster redeploys
here is where I a a bit torn

A)Rsync/Bacula to our "private cloud" during the night, compress further 7z => to our cloud server
Pro's
more redundancy
getting data to Client if NAS fails and backups are needed is faster our clients could get a full backup by us driving over to them with an external of backup from the past two weeks
Con's
More steps

B)NAS device runs Bacula=>compress=>upload to cloud
Pro's
We don't clog any of our WAN, backup depends on clients connection
We don't have to use any storage
Con's
Slowish

Kinda torn between these two. I prefer A more offer the customer, but B is more simplistic.

Plan A would be 55c/GB + 50/mo + ~150 for NAS setup/deploy

Plan B would be $10/mo per PC(no charge per/GB) + 50/mo + ~150 NAS setup deploy

Thoughts? Which would you all choose given the option?

This is not something you want to roll and support yourself. The poo poo you will land yourself in when your backups fail will end your company. There's dozens of cloud backup vendors out there who do rebranding partnerships all the time. Talk to them instead.

# ? Mar 16, 2012 19:15

Dilbert As FUCK: Sep 8, 2007; by Cowcaster; Pillbug

Misogynist posted:

This is not something you want to roll and support yourself. The poo poo you will land yourself in when your backups fail will end your company. There's dozens of cloud backup vendors out there who do rebranding partnerships all the time. Talk to them instead.

Normally I would say yeah to that but another company is offering something similar to our clients and we were asked to make a counter offer, not much you can do. I figured out the easiest way to do it, I feel like an idiot for not seeing it sooner.

local NAS(local backups) => our Servers(offsite) => CrashplanPRO(cloud)
1. Local keeps 2 weeks
2. Ours 1-2 Months
3. CrashPlanPro all backups of $client

Dilbert As FUCK fucked around with this message at 19:52 on Mar 16, 2012

# ? Mar 16, 2012 19:42

adorai: Nov 2, 2002; 10/27/04 Never forget; Grimey Drawer

do you people who use NFS for your VMware datastores over iSCSI have an equivalent to round robin to aggregate bandwidth more effectively?

# ? Mar 16, 2012 23:07

evil_bunnY: Apr 2, 2003

You mean with etherchannel VIFs?

# ? Mar 17, 2012 00:02

adorai: Nov 2, 2002; 10/27/04 Never forget; Grimey Drawer

evil_bunnY posted:

You mean with etherchannel VIFs?

if you are using a single chassis maybe, even then, I don't think VMware uses etherchannel in a way that will allow a single NFS session to use more than a single link. I could be wrong, if I am, please tell me.

# ? Mar 17, 2012 00:23

YOLOsubmarine: Oct 19, 2004; When asked which Pokemon he evolved into, Kamara pauses.

"Motherfucking, what's that big dragon shit? That orange motherfucker. Charizard."

adorai posted:

if you are using a single chassis maybe, even then, I don't think VMware uses etherchannel in a way that will allow a single NFS session to use more than a single link. I could be wrong, if I am, please tell me.

You are correct that a single ESX host will always use the same path when communicating with a specific datastore. However there are very few situations where that limitation actually presents a serious problem since most people don't buy SAN storage so they can connect up only one ESX host and run only one datastore.

As long as you have multiple hosts and multiple datastores the src-dest-ip load balance mechanism usually does a good enough job of balancing load. And on 10G it's even less of an issue since you will never max out a 10g link with a single host.

# ? Mar 17, 2012 01:44

madsushi: Apr 19, 2009; Baller.
#essereFerrari

adorai posted:

do you people who use NFS for your VMware datastores over iSCSI have an equivalent to round robin to aggregate bandwidth more effectively?

10 gig

In all seriousness, I set up my NetApp to have 2 IP addresses and then add the datastores to vSphere 5 with a DNS name that points to both IP addresses. The vSphere NFS client will use both IPs, which results in multiple hashes which results in multiple links utilized. Before vSphere 5, I would either deal with 1-gig being my storage limit, or I'd split my data into two datastores and map each one with a different IP address.

# ? Mar 17, 2012 01:44

Nomex: Jul 17, 2002; Flame retarded.

adorai posted:

do you people who use NFS for your VMware datastores over iSCSI have an equivalent to round robin to aggregate bandwidth more effectively?

You don't use NFS over iSCSI. NFS is mounted like a network drive. As long as you have LACP enabled on both the vif on the filer and on the switch it'll balance the links pretty well.

# ? Mar 17, 2012 02:06

Vulture Culture: Jul 14, 2003; I was never enjoying it. I only eat it for the nutrients.

Nomex posted:

You don't use NFS over iSCSI. NFS is mounted like a network drive. As long as you have LACP enabled on both the vif on the filer and on the switch it'll balance the links pretty well.

Do note that there is nothing like MPIO for NFS and the "round robin" being described is closer to round-robin DNS than round-robin MPIO.

# ? Mar 17, 2012 03:16

YOLOsubmarine: Oct 19, 2004; When asked which Pokemon he evolved into, Kamara pauses.

"Motherfucking, what's that big dragon shit? That orange motherfucker. Charizard."

Misogynist posted:

Do note that there is nothing like MPIO for NFS and the "round robin" being described is closer to round-robin DNS than round-robin MPIO.

NetApp actually does support a round robin load balancing mechanism on interface groups, though I don't know why. It doesn't provide any performance improvement as out of order packet reordering is a much bigger bottlenecks than link utilization at 1g and above.

# ? Mar 17, 2012 03:30

evil_bunnY: Apr 2, 2003

Nomex posted:

You don't use NFS over iSCSI. NFS is mounted like a network drive. As long as you have LACP enabled on both the vif on the filer and on the switch it'll balance the links pretty well.

I think he meant NFS as an alternative to iSCSI, rather than on top of it.

madsushi posted:

10 gig

I had to fight the urge to make that joke
:smug:

Misogynist posted:

Do note that there is nothing like MPIO for NFS and the "round robin" being described is closer to round-robin DNS than round-robin MPIO.

Also RR DNS datastores are an administrative PITA because ESXi really, really wants to believe different NFS IP addresses are different datastores.

evil_bunnY fucked around with this message at 11:00 on Mar 17, 2012

# ? Mar 17, 2012 10:49

Vulture Culture: Jul 14, 2003; I was never enjoying it. I only eat it for the nutrients.

NippleFloss posted:

NetApp actually does support a round robin load balancing mechanism on interface groups, though I don't know why. It doesn't provide any performance improvement as out of order packet reordering is a much bigger bottlenecks than link utilization at 1g and above.

I assure you that this does not work the way you think it does. At the switch level, link aggregation uses a hashing algorithm (often selectable) to determine which MAC address to resolve to as the destination of an IP frame. Typically this is done with hashing on both source and destination IP, meaning that a unique source IP to a unique destination IP will never use more than one link. You can do things like export multiple shares on different IP addresses that hash to different links, of course, but you're still bound by this fundamental limitation for every unique connection in between host and storage.

We're starting to see some vendors produce fabric-based link aggregation methods that go above and beyond what standard port channeling can do, but we're not seeing wide adoption of that hardware yet.

Vulture Culture fucked around with this message at 15:02 on Mar 17, 2012

# ? Mar 17, 2012 14:58

Muslim Wookie: Jul 6, 2005

Misogynist posted:

I assure you that this does not work the way you think it does. At the switch level, link aggregation uses a hashing algorithm (often selectable) to determine which MAC address to resolve to as the destination of an IP frame. Typically this is done with hashing on both source and destination IP, meaning that a unique source IP to a unique destination IP will never use more than one link. You can do things like export multiple shares on different IP addresses that hash to different links, of course, but you're still bound by this fundamental limitation for every unique connection in between host and storage.

We're starting to see some vendors produce fabric-based link aggregation methods that go above and beyond what standard port channeling can do, but we're not seeing wide adoption of that hardware yet.

Not sure what you're thinking about because I see the same behavior as NippleFloss. On the filer I can tell it to round robin the links and it will send packets down each of the links. Sure, on the switch it may end up going through the same link, or not, or something in between, but that's not the issue here. What I, and it seems NippleFloss as well, end up seeing is packets arriving out of order.

In fact, I would go so far as to say as you've misunderstood what was written, which is something I'm very loathe to say to you Misogynist because there is a high chance I'm wrong

# ? Mar 17, 2012 17:58

Vulture Culture: Jul 14, 2003; I was never enjoying it. I only eat it for the nutrients.

marketingman posted:

Not sure what you're thinking about because I see the same behavior as NippleFloss. On the filer I can tell it to round robin the links and it will send packets down each of the links. Sure, on the switch it may end up going through the same link, or not, or something in between, but that's not the issue here. What I, and it seems NippleFloss as well, end up seeing is packets arriving out of order.

In fact, I would go so far as to say as you've misunderstood what was written, which is something I'm very loathe to say to you Misogynist because there is a high chance I'm wrong

You know, I actually removed a parenthesized section in my original version of that reply that said something like "at least in a server-to-server context with switches involved." Now I'm kind of upset that I did.

You're right, any host can load-balance its outbound packets however it wants, but ultimately the switch is going to be subject to all the limitations of Ethernet as a switching medium. You could probably make this work somehow if all of your hosts completely disabled Ethernet spoof-detection heuristics.

# ? Mar 17, 2012 18:25

YOLOsubmarine: Oct 19, 2004; When asked which Pokemon he evolved into, Kamara pauses.

"Motherfucking, what's that big dragon shit? That orange motherfucker. Charizard."

Misogynist posted:

You know, I actually removed a parenthesized section in my original version of that reply that said something like "at least in a server-to-server context with switches involved." Now I'm kind of upset that I did.

You're right, any host can load-balance its outbound packets however it wants, but ultimately the switch is going to be subject to all the limitations of Ethernet as a switching medium. You could probably make this work somehow if all of your hosts completely disabled Ethernet spoof-detection heuristics.

The issue isn't what the switches choose to do with the packets once they arrive, which is no different than how they behave with any other load balancing mechanism, it's that minute latency differences between switches cause packets that are part of the same TCP session to arrive out of order when they take different physical paths through the network. Round Robin load balancing actually violates the 802.3ad standard because it does not enforce temporal ordering. Which makes it weird that NetApp includes it as an option, but they sure do.

# ? Mar 17, 2012 20:14

Vulture Culture: Jul 14, 2003; I was never enjoying it. I only eat it for the nutrients.

NippleFloss posted:

The issue isn't what the switches choose to do with the packets once they arrive, which is no different than how they behave with any other load balancing mechanism, it's that minute latency differences between switches cause packets that are part of the same TCP session to arrive out of order when they take different physical paths through the network. Round Robin load balancing actually violates the 802.3ad standard because it does not enforce temporal ordering. Which makes it weird that NetApp includes it as an option, but they sure do.

Thanks for the clarification (even though you're basically restating what you said earlier because I'm dense), now I see the polarity mismatch between where we were both coming from.

# ? Mar 17, 2012 20:21

YOLOsubmarine: Oct 19, 2004; When asked which Pokemon he evolved into, Kamara pauses.

"Motherfucking, what's that big dragon shit? That orange motherfucker. Charizard."

Misogynist posted:

Thanks for the clarification (even though you're basically restating what you said earlier because I'm dense), now I see the polarity mismatch between where we were both coming from.

No worries. From a switch to switch perspective you are correct that I don't believe any switch vendors actually support an RR load balancing mechanism and properly speaking they could not do so without violating portions of the 802.3ad standard.

I mostly threw that information out there as a random footnote, definitely not as a recommendation that anyone actually pursue using it, or consider it a good idea. SRC-DEST-IP-PORT is my preferred load-balancing mechanism for devices that support it.

# ? Mar 17, 2012 20:29

adorai: Nov 2, 2002; 10/27/04 Never forget; Grimey Drawer

so anyway, the point is that iSCSI can utilize MPIO round robin via multiple initiators and sessions, NFS can't, right?

# ? Mar 17, 2012 21:04

YOLOsubmarine: Oct 19, 2004; When asked which Pokemon he evolved into, Kamara pauses.

"Motherfucking, what's that big dragon shit? That orange motherfucker. Charizard."

adorai posted:

so anyway, the point is that iSCSI can utilize MPIO round robin via multiple initiators and sessions, NFS can't, right?

Correct. NFS has no protocol level load balancing. It can only piggyback on network layer load balancing, at least in current versions of the protocol.

# ? Mar 17, 2012 21:18

Adbot: ADBOT LOVES YOU

# ? Apr 18, 2024 23:53

evil_bunnY: Apr 2, 2003

So how do people go about fixing that on 1GBE storage networks?

# ? Mar 17, 2012 21:21

The Something Awful Forums > Discussion > Serious Hardware/Software Crap > Enterprise Storage Megathread: Why is my NAS a SAN?

«‹›206 »