Enterprise Storage Megathread: Why is my NAS a SAN?

The Something Awful Forums > Discussion > Serious Hardware/Software Crap > Enterprise Storage Megathread: Why is my NAS a SAN?

«‹›207 »

three: Aug 9, 2007; i fantasize about ndamukong suh licking my doodoo hole

Misogynist posted:

I had absolutely no difficulty reaching wire speed with both NFS and iSCSI in our test environment (we eventually deployed on FC anyway at the behest of our director) and neither did VMware. Your consultant is feeding you a crock of poo poo. I did find that we had a 15% drop in iSCSI performance when routing through the core, whereas we had near 100% of wire speed with the initiators and targets on the same VLAN. We might have been able to tune this out by tweaking TCP settings to accommodate the extra latency, but we were only using it in dev. We were not using jumbo frames. This probably made the performance hit much worse than it might have been if our network was properly segmented out for routing storage traffic.

In terms of 10 gigabit: I imagine this is the area where you will find performance relative to wire speed is very, very heavily vendor-dependent (and obviously heavily dependent on your backend spindles and layout as well). I'm under the impression that iSCSI at 10 gigabit is still rather CPU-intensive and this is one circumstance where you might want to consider one of the new Broadcom iSCSI 10GbE HBAs. Then again, if you need 10 gigabits of iSCSI throughput on a single box, you probably have somewhere in the vicinity of 32 cores anyway.

The real sell of 10 gigabit for VMware environments at this point is reduced cabling -- use two 10 gigabit uplinks as your network connections to each VMware box and use the 1000V's QoS features to sort out and prioritize management, VMotion/FT, and VM traffic. It's still pretty rare to see a system that will suck down all that iSCSI bandwidth. Remember that most disk throughput is measured in IOPS, not gigabits.

He basically showed us perfmon graphs which were peaking at around 45MBps during our heavy DB load times, and coinciding with this were disk queues in the 10+ range. His summary was that iSCSI is inheritently inefficient which means iSCSI maxes out at 45MBps per NIC. This kind of seemed like bullshit. That'd only be around 35% utilization of the pipe, if I'm correct in that calculation?

This is on an Oracle database, for what it's worth.

Our SAN is connected to a private switch connected to the 2 DB servers, so it doesn't go through the core.

He's going to do a formal write-up of his findings, I believe. Thanks for the VMware link.

How do you think I should handle this with the consultant?

# ? Oct 12, 2010 01:30

Adbot: ADBOT LOVES YOU

# ? Apr 23, 2024 18:30

adorai: Nov 2, 2002; 10/27/04 Never forget; Grimey Drawer

three posted:

How do you think I should handle this with the consultant?

What was he there to consult on? Just the speed issue? If so, I would handle it by not paying the bill until he did his job. In testing we have been able to saturate a single gigabit link (around 95MBps, close enough) with iSCSI, so he's full of poo poo.

Are you able to add a second link, just to see if that helps?

# ? Oct 12, 2010 01:45

three: Aug 9, 2007; i fantasize about ndamukong suh licking my doodoo hole

adorai posted:

What was he there to consult on? Just the speed issue? If so, I would handle it by not paying the bill until he did his job. In testing we have been able to saturate a single gigabit link (around 95MBps, close enough) with iSCSI, so he's full of poo poo.

Are you able to add a second link, just to see if that helps?

He was there to identify any slowness in our Oracle database. His conclusion was that iSCSI was not capable of handling the network traffic, as iSCSI "maxes out at around 45MBps". His solution was: upgrade to 10GbE iSCSI or move to FC.

Our SAN is an Equallogic SAN with 4 ports all connected, and they each hit around 45MBps (4x45MBps). He said this was the cause of disk queueing showing up in perfmon.

I am going to bring this up with management, and am going to specifically show that VMware PDF that Misognyst posted. If he's pointing to something being the bottleneck when it isn't, then he certainly didn't do his job.

Edit: Also found this PDF from NetApp comparing the storage protocols with Oracle RAC.

three fucked around with this message at 03:29 on Oct 12, 2010

# ? Oct 12, 2010 02:18

H110Hawk: Dec 28, 2006

three posted:

He was there to identify any slowness in our Oracle database. His conclusion was that iSCSI was not capable of handling the network traffic, as iSCSI "maxes out at around 45MBps". His solution was: upgrade to 10GbE iSCSI or move to FC.

Our SAN is an Equallogic SAN with 4 ports all connected, and they each hit around 45MBps (4x45MBps). He said this was the cause of disk queueing showing up in perfmon.

First I would make sure your setup is indeed capable of performing faster. Double check that your disks are bored, your server isn't backed up elsewhere, and that the network itself is capable of filling the pipe. Throwing 10gbe at a problem without verifying the source cause is a great way to get 45MBps (bits? Bytes?) of throughput on a 10Gbps pipe.

Have your network guy sign off on the network after actually checking the ports. Make sure your NIC's are the right brand (Intel, Broadcom if you must), etc.

# ? Oct 12, 2010 03:36

MrMoo: Sep 14, 2000

adorai posted:

What was he there to consult on? Just the speed issue? If so, I would handle it by not paying the bill until he did his job. In testing we have been able to saturate a single gigabit link (around 95MBps, close enough) with iSCSI, so he's full of poo poo.

You should be able to reach 120MBps, gigabit isn't exactly demanding on hardware anymore. If you are concerned at the interrupt overhead from the NIC you should be using Intel Server NICs and for high performance SAN connectivity look at ToE, it is definitely a requirement for 10GigE speeds.

The Open-iSCSI/Linux-iSCSI project is a useful source of performance information too, it's all hidden in the mailing list now though.

MrMoo fucked around with this message at 03:47 on Oct 12, 2010

# ? Oct 12, 2010 03:42

three: Aug 9, 2007; i fantasize about ndamukong suh licking my doodoo hole

H110Hawk posted:

First I would make sure your setup is indeed capable of performing faster. Double check that your disks are bored, your server isn't backed up elsewhere, and that the network itself is capable of filling the pipe. Throwing 10gbe at a problem without verifying the source cause is a great way to get 45MBps (bits? Bytes?) of throughput on a 10Gbps pipe.

Have your network guy sign off on the network after actually checking the ports. Make sure your NIC's are the right brand (Intel, Broadcom if you must), etc.

The SAN<->DB connection is strictly SAN<->Private, Isolated Cisco Switch<->DB. The NICs are Broadcom and Intel. Broadcom NIC is dedicated to the iSCSI traffic. Intel NIC handles regular traffic.

Perfmon shows 45 megabytes per second on each NIC with iSCSI traffic. Consultant is claiming that is the max iSCSI can do on a 1Gbps NIC due to protocol overhead.

Equallogic's SANHQ is reporting the IOPS workload is only 30%.

# ? Oct 12, 2010 03:45

MrMoo: Sep 14, 2000

Synology boxes are managing 80MBs over iSCSI on gigabit,

http://www.synology.com/enu/products/4-5bay_perf.php

ReadyNAS 3200 on gigabit manages 90-110MBs,

http://www.readynas.com/?page_id=2468

The ReadyNAS 4200 lists numbers for 10GigE at 200-300MBs.

MrMoo fucked around with this message at 04:28 on Oct 12, 2010

# ? Oct 12, 2010 03:54

HorusTheAvenger: Nov 7, 2005

three posted:

The SAN<->DB connection is strictly SAN<->Private, Isolated Cisco Switch<->DB. The NICs are Broadcom and Intel. Broadcom NIC is dedicated to the iSCSI traffic. Intel NIC handles regular traffic.

Perfmon shows 45 megabytes per second on each NIC with iSCSI traffic. Consultant is claiming that is the max iSCSI can do on a 1Gbps NIC due to protocol overhead.

Equallogic's SANHQ is reporting the IOPS workload is only 30%.

I would start looking at the network cards and network configuration. I watched a consultant install cheap $20 gigabit NICs in a server once that had zero offload capabilities in order to speed up access to an exchange server (by giving iSCSI dedicated NICs). It dropped the throughput of iSCSI down to around that speed and pegged the CPU around 15% while the server was otherwise idle. It's quite possible your consultant's never used real server-grade NICs before. They are expensive for a reason.

Other thoughts are to check your NICs and switches to ensure they are flow control capable and it's not disabled. Turn on jumbo frames for the entire path between server and storage if possible.

# ? Oct 12, 2010 04:26

Nebulis01: Dec 30, 2003; Technical Support Ninny

I could see the 45mbps limit with a a large queue depth depending on the read/write size, and array setup perhaps? We get poo poo performance with a QD of 32 in CrystalMark but since our SAN is a glorified file store anyway, I'm not really worried about it.

EDIT:
I was curious I ran an ATTO 2.46 run with 4KB blocks and a Queue Depth of 10.
1GB Length - Read 42MB/Sec, Write 1.1MB/Sec.
2GB Length - Read 52MB/Sec, Write 5.2MB/Sec

As a point of reference

These are from the following configuration:

Dell Equal Logic PS4000X SAN in RAID6 (400GB 10K RPM SAS, 16 spindles). Controller on the SAN has a 2GB cache and 2 GigE iSCSI connections to a dedicated GigE switch. Jumbo Frames and Flow Control are enabled. Server has 4 GigE Broadcom iSCSI NICs w/ Jumbo Frames and offloading enabled, and the dell multi path driver balances between them.

code:

Below is with MPIO v3.4.2, Firmware 5.0.2

-----------------------------------------------------------------------
CrystalDiskMark 3.0 x64 (C) 2007-2010 hiyohiyo
                           Crystal Dew World : [url]http://crystalmark.info/[/url]
-----------------------------------------------------------------------
* MB/s = 1,000,000 byte/s [SATA/300 = 300,000,000 byte/s]

           Sequential Read :   170.694 MB/s
          Sequential Write :   100.999 MB/s
         Random Read 512KB :    56.960 MB/s
        Random Write 512KB :    76.950 MB/s
    Random Read 4KB (QD=1) :     1.086 MB/s [   265.0 IOPS]
   Random Write 4KB (QD=1) :     7.727 MB/s [  1886.4 IOPS]
   Random Read 4KB (QD=32) :    24.852 MB/s [  6067.5 IOPS]
  Random Write 4KB (QD=32) :    24.209 MB/s [  5910.3 IOPS]

  Test : 1000 MB [I: 0.0% (0.1/499.9 GB)] (x9)
  Date : 2010/09/29 14:36:01
    OS : Windows Server 2008 R2 Datacenter Edition (Full installation) [6.1 Build 7600] (x64)


Below is with MPIO v3.4.1, Firmware 4.3.7

-----------------------------------------------------------------------
CrystalDiskMark 3.0 x64 (C) 2007-2010 hiyohiyo
                           Crystal Dew World : [url]http://crystalmark.info/[/url]
-----------------------------------------------------------------------
* MB/s = 1,000,000 byte/s [SATA/300 = 300,000,000 byte/s]

           Sequential Read :   180.898 MB/s
          Sequential Write :   101.380 MB/s
         Random Read 512KB :    60.362 MB/s
        Random Write 512KB :    89.607 MB/s
    Random Read 4KB (QD=1) :     1.514 MB/s [   369.6 IOPS]
   Random Write 4KB (QD=1) :     6.854 MB/s [  1673.3 IOPS]
   Random Read 4KB (QD=32) :    29.304 MB/s [  7154.3 IOPS]
  Random Write 4KB (QD=32) :    17.736 MB/s [  4330.1 IOPS]

  Test : 1000 MB [I: 0.7% (3.3/499.9 GB)] (x9)
  Date : 2010/09/23 16:53:14
    OS : Windows Server 2008 R2 Datacenter Edition (Full installation) [6.1 Build 7600] (x64)

Nebulis01 fucked around with this message at 15:59 on Oct 12, 2010

# ? Oct 12, 2010 15:41

three: Aug 9, 2007; i fantasize about ndamukong suh licking my doodoo hole

Nebulis01 posted:

I could see the 45mbps limit with a a large queue depth depending on the read/write size, and array setup perhaps? We get poo poo performance with a QD of 32 in CrystalMark but since our SAN is a glorified file store anyway, I'm not really worried about it.

EDIT:
I was curious I ran an ATTO 2.46 run with 4KB blocks and a Queue Depth of 10.
1GB Length - Read 42MB/Sec, Write 1.1MB/Sec.
2GB Length - Read 52MB/Sec, Write 5.2MB/Sec

I'm not really sure what all of this means. v :shobon:

v This level of storage discussion is a bit over my head. Here are the results from SAN HQ:

IOPS/Latency:

Overall Throughput:

Port 1 Network Throughput:

From that, I'm not seeing any clear bottlenecks? IOPS appears to only be getting around 50% utilization and for the most part throughput isn't very high. It's only been running the the past day or so, but users have been hitting it pretty heavily.

# ? Oct 12, 2010 16:20

H110Hawk: Dec 28, 2006

three posted:

Broadcom NIC

Make sure your NIC doesn't suck. Find some way to make that thing roll out a gbps of traffic to another box. You might need to schedule a maintenance window for whatever is hammering your SAN. (iperf? Whate are the kids using these days?) Have your network lackey `show int` on both sides of the connection and all intermediate ones.

Throw your NIC away and put an Intel in there it's not worth loving around.

# ? Oct 12, 2010 17:18

Nebulis01: Dec 30, 2003; Technical Support Ninny

The only thing I can think of (not a SAN/Storage guy myself) would be that you're having issues using just one link and having such small reads/writes. Is there any way you can utilize the dell MPIO driver and add another link or two to that box?

Also, I've got our setup running on Broadcom and they seem to run just fine. What firmware revision are you running? There have been some pretty decent fixes in the last few firmware/MPIO driver releases

Nebulis01 fucked around with this message at 17:30 on Oct 12, 2010

# ? Oct 12, 2010 17:24

Vulture Culture: Jul 14, 2003; I was never enjoying it. I only eat it for the nutrients.

Our setup, mentioned above, also ran on the integrated Broadcom NetXtreme II NICs that come in most IBM servers. We hit wire speeds handily.

H110Hawk posted:

Throw your NIC away and put an Intel in there it's not worth loving around.

This is very likely a complete waste of money if the driver isn't shown to be sucking down a ton of CPU. I've never seen a gigabit network just plain cap out at half its specified speed.

Nebulis01 posted:

The only thing I can think of (not a SAN/Storage guy myself) would be that you're having issues using just one link and having such small reads/writes. Is there any way you can utilize the dell MPIO driver and add another link or two to that box?

This is a very valid point -- I'm talking about wire speed and your numbers aren't coming from the wire. Can you get utilization numbers from your server's iSCSI switchport rather than the storage processor?

Vulture Culture fucked around with this message at 17:47 on Oct 12, 2010

# ? Oct 12, 2010 17:44

three: Aug 9, 2007; i fantasize about ndamukong suh licking my doodoo hole

Nebulis01 posted:

The only thing I can think of (not a SAN/Storage guy myself) would be that you're having issues using just one link and having such small reads/writes. Is there any way you can utilize the dell MPIO driver and add another link or two to that box?

Also, I've got our setup running on Broadcom and they seem to run just fine. What firmware revision are you running? There have been some pretty decent fixes in the last few firmware/MPIO driver releases

It's actually using all 4 links, but eth0 is definitely the most utilized. Sorry, I may have posted misleading images. During the highest spike, eth0 was doing 75.3MB/sec while the other 3 each did ~25MB/sec respectively. It is using the Equallogic MPIO drivers with round-robin load balancing. I wanted to reconfirm what NIC it was using, and 3 of the iSCSI connections are using the Broadcom and 1 is using the Intel NIC. Does the disparity hint that the load balancing is off?

I wonder if it's a coincidence 1 interface on the SAN is doing 3x as much bandwidth as the other 3, and on the server 1 iSCSI interface is an Intel NIC and the other 3 are Broadcom. This certainly seems to want to verify your point about the NICs. I've been trying to think of a way to 100% verify this, but I can't figure out how to link X interface on the SAN to Y interface on the server.

Considering one link did 75MB/sec, I'm going to say that negates any myth that iSCSI can't do more than 45MB/sec, unless I'm completely off-base.

It doesn't look this is really plateauing from this graph as if it's bottle-necking, am I wrong here?

Thanks for everyone's help thus far. You guys really know your stuff.

Misogynist posted:

This is a very valid point -- I'm talking about wire speed and your numbers aren't coming from the wire. Can you get utilization numbers from your server's iSCSI switchport rather than the storage processor?

Getting much done from the networking side can be difficult. Is there any way to log the Networking Tab (the one that shows % Utilization) in Windows Task Manager in Perfmon or another tool?

three fucked around with this message at 18:35 on Oct 12, 2010

# ? Oct 12, 2010 18:26

Nebulis01: Dec 30, 2003; Technical Support Ninny

I would say that the MPIO driver is not balancing those NICs the same way, it looks like its weighing eth0 (intel?) pretty heavily. Is there a reason your round robin instead of Least Queue Depth?

In Perfmon (2008R2) you can log iSCSI by connection in both bytes sent/received and requests

# ? Oct 12, 2010 19:00

H110Hawk: Dec 28, 2006

Misogynist posted:

Our setup, mentioned above, also ran on the integrated Broadcom NetXtreme II NICs that come in most IBM servers. We hit wire speeds handily.

This is very likely a complete waste of money if the driver isn't shown to be sucking down a ton of CPU. I've never seen a gigabit network just plain cap out at half its specified speed.

There is a special place in hell for the bnx2 linux driver and the related Broadcom NetXtreme II NICs.

# ? Oct 12, 2010 19:08

HorusTheAvenger: Nov 7, 2005

three posted:

It's actually using all 4 links, but eth0 is definitely the most utilized. Sorry, I may have posted misleading images. During the highest spike, eth0 was doing 75.3MB/sec while the other 3 each did ~25MB/sec respectively. It is using the Equallogic MPIO drivers with round-robin load balancing. I wanted to reconfirm what NIC it was using, and 3 of the iSCSI connections are using the Broadcom and 1 is using the Intel NIC. Does the disparity hint that the load balancing is off?

I wonder if it's a coincidence 1 interface on the SAN is doing 3x as much bandwidth as the other 3, and on the server 1 iSCSI interface is an Intel NIC and the other 3 are Broadcom. This certainly seems to want to verify your point about the NICs. I've been trying to think of a way to 100% verify this, but I can't figure out how to link X interface on the SAN to Y interface on the server.

Considering one link did 75MB/sec, I'm going to say that negates any myth that iSCSI can't do more than 45MB/sec, unless I'm completely off-base.

Whoa, hold on. Most Operating systems if you have multiple NICs configured with different IPs on the same subnet without using some form of link aggregation you end up with ARP Flux.

http://www.google.ca/search?q=arp+flux

If you're getting this, what traffic goes on what interface can be pretty random and unpredictable.

# ? Oct 12, 2010 19:29

Mausi: Apr 11, 2006

HorusTheAvenger posted:

Whoa, hold on. Most Operating systems if you have multiple NICs configured with different IPs on the same subnet without using some form of link aggregation you end up with ARP Flux.

http://www.google.ca/search?q=arp+flux

If you're getting this, what traffic goes on what interface can be pretty random and unpredictable.

If this is the Equalogic server he's talking about they're designed to work like this. He mentioned the Equalogic MPIO driver, so that's probably the case.
If it's the random Windows box on the other end of the iSCSI link then it would be more of a concern.

# ? Oct 12, 2010 19:32

three: Aug 9, 2007; i fantasize about ndamukong suh licking my doodoo hole

Mausi posted:

If this is the Equalogic server he's talking about they're designed to work like this. He mentioned the Equalogic MPIO driver, so that's probably the case.
If it's the random Windows box on the other end of the iSCSI link then it would be more of a concern.

It is indeed an Equallogic PS6000XV.

I made a formal report with the results from SANHQ, and snippets of PDFs and other things you guys posted and I found online. We're going to take it to the consultant and have him explain his findings further.

Fortunately, I found out this was just a trial period for the consulting group in question to allow them to prove themselves/get their foot in the door, and the amount they were paid was minimal. If their findings are junk, we just won't use them anymore. Again, thanks for everyone's input and help.

Saying iSCSI in general maxes out at 45MBps due to protocol overhead seems to borderline on outright lying, though.

# ? Oct 12, 2010 19:56

Vulture Culture: Jul 14, 2003; I was never enjoying it. I only eat it for the nutrients.

H110Hawk posted:

There is a special place in hell for the bnx2 linux driver and the related Broadcom NetXtreme II NICs.

Is this true in the general case? We've only had problems related to major regressions in RHEL 5.3/5.4 kernel updates that only affected machines using jumbo frames. Everything else has been running pretty smoothly.

three posted:

Saying iSCSI in general maxes out at 45MBps due to protocol overhead seems to borderline on outright lying, though.

I'd wager incompetence over malice, though the end result is the same.

Vulture Culture fucked around with this message at 20:05 on Oct 12, 2010

# ? Oct 12, 2010 20:02

three: Aug 9, 2007; i fantasize about ndamukong suh licking my doodoo hole

Misogynist posted:

I'd wager incompetence over malice, though the end result is the same.

You're probably right. He was actually a really nice guy. He seemed to be heavily, heavily focused on FC storage, and specifically NetApp devices.

He came really close to recommending we dump our 8+ Equallogic units and repurchase all NetApp devices.

# ? Oct 12, 2010 20:12

Mausi: Apr 11, 2006

three posted:

He came really close to recommending we dump our 8+ Equallogic units and repurchase all NetApp devices.

If this is the case he's not an independent consultant.
A proper consultant will never tell you to dump what you have and start again. But your average engineer will certainly steer you towards what he knows if he wants to have ongoing work.

TBH I've had issues with the built in Broadcom IIs that a lot of vendors are favouring for onboard gigabit these days. I think you'd do quite well with a dedicated iSCSI controller.

# ? Oct 12, 2010 20:27

Nebulis01: Dec 30, 2003; Technical Support Ninny

I'd venture this 'consultant' is also a NetApp certified partner/preferred vendor or similar.

# ? Oct 12, 2010 21:13

H110Hawk: Dec 28, 2006

Misogynist posted:

Is this true in the general case? We've only had problems related to major regressions in RHEL 5.3/5.4 kernel updates that only affected machines using jumbo frames. Everything else has been running pretty smoothly.

Yes. I fought them constantly at my old job running debian on a wide range of kernels, and now again at this job. We are using CentOS 5.4 here. Though to be fair, we did get the RHN article which says how to solve the problem of "whoops! the bnx2 driver crashed under load!"

code:

# head -n 1 /etc/modprobe.conf
options bnx2 disable_msi=1

We just bought cases of cheap intel dual port 10/100 nics and threw them into nearly every bnx2 server we had at my last job. You can get them second hand quite cheaply if you're willing to buy whole stocks of them.

Here a prerequisite of our new servers was that they had Intel NIC's. Of course the handful of Dell's we bought to supplement those bulk servers use bnx2 nic's for no apparent reason. Had we not found that modprobe article we would have bought Intel gig server adapters for them, and I still might.

# ? Oct 12, 2010 21:42

three: Aug 9, 2007; i fantasize about ndamukong suh licking my doodoo hole

The consultant sent us his formal write-up of his findings. Here are some snippets:

quote:

CONCLUSION
It is the conclusion of the consultingCompany team that during the times of high disk utilization, the four (4) 1Gbe iSCSI connections are experiencing periods of link saturation. Due to the overhead associated with the TCP/IP protocol, 1Gbe connections typically have a sustained throughput ceiling of approximately 40MB (megabytes) per second, as is reflected in the collected data. This limitation is causing disk contention as the demand for disk access is higher than the available throughput. The high level of disk queuing during these periods of increased utilization further supports this finding.

quote:

RECOMMENDATIONS AND NEXT STEPS
The performance data shows that the current four (4) 1Gbe connections to the iSCSI SAN does not offer enough throughput for the hosted workload during times of high utilization. The configuration of the underlying iSCSI implementation adheres to most of Microsoft's best practices. Due to the throughput limitations associated with implementing iSCSI over 1Gbe, there are few options for increasing the disk performance with this current infrastructure. Setting the Delayed Acknowledgement option on DB1 and DB2 to �0� could reduce the number of packet retransmits and enabling Jumbo Frames (MTU size increased to 9000 bytes) will provide a lower number of packets sent across the network. These two options could provide better efficiency over the iSCSI network; however, the 1Gbe links will still pose a limitation to the total available throughput to the storage system.

As the system's workload continues to increase, it would be recommended to consider upgrading the existing iSCSI implementation to 10Gbe or converting the storage system to 4Gb or 8Gb Fiber Channel. Both alternatives provide greater throughput to the storage system and can better accommodate the increased workload.
It was noted while on site that companyName has a number of similar storage system within the production environment. It would be further recommended to investigate the performance of these systems as well and determine if an enterprise class consolidated storage system would better suit the business needs. An enterprise level storage systems can also provide multiple connection options, such as iSCSI, FCP, CIFS and NFS and provides the flexibility to allocate storage based on service requirements and performance demand while lowering the associated administrative overhead.

Thoughts? We're thinking about sending this to our Dell rep to have them refute the iSCSI complaints as well.

# ? Oct 18, 2010 00:52

HorusTheAvenger: Nov 7, 2005

three posted:

Thoughts? We're thinking about sending this to our Dell rep to have them refute the iSCSI complaints as well.

Depends on your Dell rep. Many reps would enjoy the prospects of selling 10Gb infrastructure.

Is that how the consultants gave the perfmon screenshot? It's kind of hard to read.

# ? Oct 18, 2010 01:09

three: Aug 9, 2007; i fantasize about ndamukong suh licking my doodoo hole

HorusTheAvenger posted:

Depends on your Dell rep. Many reps would enjoy the prospects of selling 10Gb infrastructure.

Is that how the consultants gave the perfmon screenshot? It's kind of hard to read.

Good point on the Dell rep; I don't think our reps would do that though. That's the same quality of image as in the PDF. It's like he intentionally made it as blurry as possible.

# ? Oct 18, 2010 01:21

Nomex: Jul 17, 2002; Flame retarded.

three posted:

The consultant sent us his formal write-up of his findings. Here are some snippets:

Thoughts? We're thinking about sending this to our Dell rep to have them refute the iSCSI complaints as well.

Is there a time of the day or month when this DB has low I/O? If there is, you should try copying a very large file from the server to the storage. This will at least show you if it is actually the DB causing the issue or if it's a problem with the setup.

# ? Oct 18, 2010 02:44

MrMoo: Sep 14, 2000

quote:

enabling Jumbo Frames (MTU size increased to 9000 bytes) will provide a lower number of packets sent across the network.

This guy is from 2003? Jumbo frames were created to alleviate the problem of hardware being too slow to saturate networks with small packets. As history showed, hardware got faster and we have ToE to do it all in hardware. Jumbo frames increase bandwidth for CPU limited nodes at the expense of latency, with a database I'm hedging latency is going to be more useful than throughput.

Isn't this kind of setup easy to demonstrate by slapping a single SSD into the SAN and comparing test results with the HDD pool?

MrMoo fucked around with this message at 03:20 on Oct 18, 2010

# ? Oct 18, 2010 03:17

adorai: Nov 2, 2002; 10/27/04 Never forget; Grimey Drawer

MrMoo posted:

slapping a single SSD into the SAN

Not sure about your environment, but we don't "slap" anything into our SAN. nor do we even have a free drive bay to do so even if we wanted to.

# ? Oct 18, 2010 03:28

three: Aug 9, 2007; i fantasize about ndamukong suh licking my doodoo hole

Nomex posted:

Is there a time of the day or month when this DB has low I/O? If there is, you should try copying a very large file from the server to the storage. This will at least show you if it is actually the DB causing the issue or if it's a problem with the setup.

Unfortunately there is no room left on the SAN to create a volume that would allow me to copy a file to it. It's currently all tied up with Oracle-y goodness.

# ? Oct 18, 2010 03:42

TobyObi: Mar 7, 2005; Ain't nobody Obi like Toby ;)

three posted:

The consultant sent us his formal write-up of his findings. Here are some snippets:

Thoughts? We're thinking about sending this to our Dell rep to have them refute the iSCSI complaints as well.

I can get > 100MB/sec out of a OpenSolaris based iSCSI array, with a zpool and an iSCSI share.

iSCSI overheads are nowhere near that high.

The answer is either the Equallogic is poo poo, your switches are poo poo, or something else entirely is going on.

Hint: it's most likely the last one...

# ? Oct 18, 2010 04:23

MrMoo: Sep 14, 2000

adorai posted:

Not sure about your environment, but we don't "slap" anything into our SAN. nor do we even have a free drive bay to do so even if we wanted to.

Well it depends on how important is is to reach a resolution. Before you do something entertaining like degrade an array to free a bay it would sound more pertinent to simply drop down to one Ethernet link and compare performance.

# ? Oct 18, 2010 06:21

Nomex: Jul 17, 2002; Flame retarded.

How many disks did you say were in this array? Also what size and speed are they?

Nomex fucked around with this message at 17:32 on Oct 18, 2010

# ? Oct 18, 2010 17:27

1000101: May 14, 2003; BIRTHDAY BIRTHDAY BIRTHDAY BIRTHDAY BIRTHDAY BIRTHDAY FRUITCAKE!

Some Consultant posted:

CONCLUSION
It is the conclusion of the consultingCompany team that during the times of high disk utilization, the four (4) 1Gbe iSCSI connections are experiencing periods of link saturation. Due to the overhead associated with the TCP/IP protocol, 1Gbe connections typically have a sustained throughput ceiling of approximately 40MB (megabytes) per second, as is reflected in the collected data. This limitation is causing disk contention as the demand for disk access is higher than the available throughput. The high level of disk queuing during these periods of increased utilization further supports this finding.

Is this screenshot all the data he is basing his findings on? I can't get anything conclusive out of that chart except that whatever system he is on is queuing on occasion.

There is even a spike to the far right which appears to refute his claim.

Did he look at the storage via any sort of performance tools they have at all? You can't really troubleshoot a storage performance issue by looking at one host connected to it. I hope he had more data to share but its possible he doesn't.

quote:

RECOMMENDATIONS AND NEXT STEPS
The performance data shows that the current four (4) 1Gbe connections to the iSCSI SAN does not offer enough throughput for the hosted workload during times of high utilization. The configuration of the underlying iSCSI implementation adheres to most of Microsoft's best practices. Due to the throughput limitations associated with implementing iSCSI over 1Gbe, there are few options for increasing the disk performance with this current infrastructure. Setting the Delayed Acknowledgement option on DB1 and DB2 to �0� could reduce the number of packet retransmits and enabling Jumbo Frames (MTU size increased to 9000 bytes) will provide a lower number of packets sent across the network. These two options could provide better efficiency over the iSCSI network; however, the 1Gbe links will still pose a limitation to the total available throughput to the storage system.

As the system's workload continues to increase, it would be recommended to consider upgrading the existing iSCSI implementation to 10Gbe or converting the storage system to 4Gb or 8Gb Fiber Channel. Both alternatives provide greater throughput to the storage system and can better accommodate the increased workload.
It was noted while on site that companyName has a number of similar storage system within the production environment. It would be further recommended to investigate the performance of these systems as well and determine if an enterprise class consolidated storage system would better suit the business needs. An enterprise level storage systems can also provide multiple connection options, such as iSCSI, FCP, CIFS and NFS and provides the flexibility to allocate storage based on service requirements and performance demand while lowering the associated administrative overhead.

Before you jump on this boat I'd make sure of the following:

1. You have enough disks to support the workload
2. You're using an appropriate RAID level for the workload
3. Enable flow control on your switches if your storage supports it
4. Make sure your storage is on the same switch as your database servers (multiple hops with iSCSI can be bad depending on network topology)
5. Look at storage specific performance counters and look for queuing etc.
6. What are your server's CPUs doing at these peak times? Its possible that your application is choking out the CPU which means less time spent on a software iSCSI initiator.
7. What are your switches doing?

Basically he is proposing a complete rip and replace which can be costly (and low probability of actually being needed) but I don't think he has done an adequate job of deep diving the root cause. I've personally seen 90-100MB/sec throughput on Celerra and NetApp on gigabit. There is just no way there is over a 60% protocol overhead from TCP at play here.

Basically any one of the factors I listed could cause things to look like "gigabit sucks" when it might actually be something like using RAID 6 on a platform with a slow implementation (I picked this because EQL RAID 6 is pretty slow.) So what happens is the disks spend a shitload of time writing parity and the controllers are going apeshit and ignoring everything while your host is spraying shitloads of data at it and all the controller does is drop it and tell it to resend later.

If your the host you start queuing things up and your network utilization goes up but the iSCSI initiator isn't getting any sort of confirmation on writes or even able to read hence network looks higher.

# ? Oct 18, 2010 17:48

three: Aug 9, 2007; i fantasize about ndamukong suh licking my doodoo hole

Yep, those perfmon counters are all he did to look into it. He didn't use anything else. I installed SANHQ afterwards; in hindsight, I probably should have hooked him up with this, but I figured he would come prepared with what he needed.

1000101 posted:

1. You have enough disks to support the workload

IOPS are only at 50% utilization at peak times according to SANHQ.

quote:

2. You're using an appropriate RAID level for the workload

We're using RAID-10.

quote:

3. Enable flow control on your switches if your storage supports it

Haven't tried this. I'm going to have our network person look into it.

quote:

6. What are your server's CPUs doing at these peak times? Its possible that your application is choking out the CPU which means less time spent on a software iSCSI initiator.

CPU utilization has been minimal the whole time. No 100% utilization or anything.

quote:

3. Enable flow control on your switches if your storage supports it
4. Make sure your storage is on the same switch as your database servers (multiple hops with iSCSI can be bad depending on network topology)
7. What are your switches doing?

Not sure about these 3 & 7. I'm going to ask our networking guy to look into those. The database/SAN are connected via a private, isolated switch in the same rack.

This is awesome feedback, thank you (and everyone else that has responded) for taking the time to write all of that up.

# ? Oct 18, 2010 17:58

Mausi: Apr 11, 2006

Everyone has been talking about iSCSI throughput on devices other than Equalogic - does anyone know what the throughput overhead of their load balancing algorithm is?

I'm not saying the consultant is right - but it could be endemic to the Equalogic array.

# ? Oct 18, 2010 18:19

three: Aug 9, 2007; i fantasize about ndamukong suh licking my doodoo hole

Mausi posted:

Everyone has been talking about iSCSI throughput on devices other than Equalogic - does anyone know what the throughput overhead of their load balancing algorithm is?

I'm not saying the consultant is right - but it could be endemic to the Equalogic array.

I found his report from Equallogic showing it's able to hit almost full link speed.

I don't think this guy would even know if Equallogic had certain limitations. He managed to spell Equallogic incorrectly as "Equilogic" throughout the entire writeup. :psyduck:

Nomex posted:

How many disks did you say were in this array? Also what size and speed are they?

Sorry, didn't see this. 16 15k-SAS disks, Raid-10 with 2 spares.

three fucked around with this message at 18:42 on Oct 18, 2010

# ? Oct 18, 2010 18:29

Nukelear v.2: Jun 25, 2004; My optional title text

Out of curiosity I just looked at the benchmarks I did on our VMWare iSCSI platform. With the cheapest configuration in the universe (Dell switches, MD3000i SAN, Integrated Broadcom nics) I pushed 116.2 MBs/sec

Like other people have said, run a benchmark to see if it caps out or if Oracle is just being wacky. IOZone should run on pretty much anything, but is slightly complicated. SQLIO is dead simple, just need 512mb or so free to make a test file and see how fast you can push a sequential read.

For reference in my test I used 'sqlio -kR -s360 -fsequential -o8 -b128 -LS -Fparam.txt' with 8 threads in param.

Nukelear v.2 fucked around with this message at 23:10 on Oct 18, 2010

# ? Oct 18, 2010 23:05

Adbot: ADBOT LOVES YOU

# ? Apr 23, 2024 18:30

Nomex: Jul 17, 2002; Flame retarded.

three posted:

Sorry, didn't see this. 16 15k-SAS disks, Raid-10 with 2 spares.

Given the number of disks you have and the raid level, 2100 IOPS would be about the maximum you would see. Obviously the cache is going to play into this, but for uncached data it looks like you're approaching the limit, at least on the far right side of the graph.

# ? Oct 18, 2010 23:32

The Something Awful Forums > Discussion > Serious Hardware/Software Crap > Enterprise Storage Megathread: Why is my NAS a SAN?

«‹›207 »