Register a SA Forums Account here!
JOINING THE SA FORUMS WILL REMOVE THIS BIG AD, THE ANNOYING UNDERLINED ADS, AND STUPID INTERSTITIAL ADS!!!

You can: log in, read the tech support FAQ, or request your lost password. This dumb message (and those ads) will appear on every screen until you register! Get rid of this crap by registering your own SA Forums Account and joining roughly 150,000 Goons, for the one-time price of $9.95! We charge money because it costs us money per month for bills, and since we don't believe in showing ads to our users, we try to make the money back through forum registrations.
 
  • Post
  • Reply
The_Groove
Mar 15, 2003

Supersonic compressible convection in the sun
It really makes you appreciate storage hardware when 120 drives fail due to a bad SAS card/cable and you don't see a performance hit while rebuilding 32 at once. That and RAID6 totally saving your rear end from losing data. Close call this week...

Adbot
ADBOT LOVES YOU

The_Groove
Mar 15, 2003

Supersonic compressible convection in the sun
Not really. I guess I forgot to say that they were all manually failed in order to power down a pair of enclosures to swap I/O modules and cables to try and find out what actually went bad. But yeah, if one path from one controller going down causes drive failures, there are bigger problems.

The_Groove
Mar 15, 2003

Supersonic compressible convection in the sun

cheese-cube posted:

On the suject of IBM midrange storage systems has anyone had a look at their new DCS3700 and DCS9900 high-density systems yet? 60 HDDs in 4U is pretty drat crazy. Also you can get the DCS9900 with either 8 x FC 8Gb or 4 x InfiniBand DDR host interfaces which is insane.
I manage an older DDN9900 w/infiniband and GPFS in an HPC environment and it's pretty solid. By which I mean that we've had GBIC failures, SATA chip failures, I/O module failures, and an entire disk enclosure failure. Sometimes two at once resulting in 60 arrays with 1 failed drive, and another 60 with 2. Raid6 has saved my rear end so many times. But it's always rebuilt things fine and DDN has some great technicians. Not sure if it's just our unit or 9900's in general. Average I/O rates are around 1GB/sec for reads and writes, with peaks up to 5-6 GB/sec.

Getting 76 DCS3700's up and running pretty soon, don't know too much about them just yet.

The_Groove
Mar 15, 2003

Supersonic compressible convection in the sun
One pair of controllers, but with 2 drive enclosures (daisy-chained) for each channel. So 1200 drives total.

The_Groove
Mar 15, 2003

Supersonic compressible convection in the sun

szlevi posted:

I used to run our entire production load on a single 8550, there's nothing wrong with them, I think. It's just a big, dumb RAID storage piece, there some options - strictly console-only mgmt - but basically it was a giant fast array (~20TB) and that's it. No features, not even snapshots, nothing.
I love the console-based management personally, so much quicker than loading up some java GUI or whatever, plus you can do stuff from your phone if you really have to. With my DDN9550 and 9900 gear the filesystem (GPFS) had all the features for snapshots, resizing filesystems, etc., so a big, dumb, storage system was perfect.

The_Groove
Mar 15, 2003

Supersonic compressible convection in the sun
Every time I have DDN gear fail, it just makes me more impressed by it. I had BOTH raid controllers for a DDN9900 storage system fail after being powered on following building maintenance. A "disk chip" in each controller had failed. DDN shipped out 2 replacements, I swapped them in, re-cabled everything, and they booted fine, reading their config (zoning, network, syslog forwarding, etc.) off one of the disks. I didn't have to do anything other than turn them on!

The_Groove
Mar 15, 2003

Supersonic compressible convection in the sun
Haha well poo poo happens, I wouldn't necessarily blame DDN for components failing in an unlikely combination resulting in unscheduled downtime. They offered to send a tech, but we decided to do the replacement ourselves since their procedure was so simple. I was pretty impressed but apparently this is nothing special!

The_Groove
Mar 15, 2003

Supersonic compressible convection in the sun
Yeah we did the work because it would be finished before a tech could get out here. It is annoying that while the 2 controllers had different "disk chips" fail, between the two they could have still seen all the disks. But, the failed disk chip on the A controller prevented access to the disk holding the configuration (it's not some internal disk in the controllers, it uses a disk in the arrays it couldn't talk to). It's too late now, but I wonder if swapping the A/B controllers would have worked temporarily, since the new A controller should have been able to read the config.

I have heard some stories about these 9900 controllers having serious issues during bootup, usually firmware related and not multiple failures though, but maybe that's changing as they age.

The_Groove
Mar 15, 2003

Supersonic compressible convection in the sun

Corvettefisher posted:

Was this on a non production box, or do you just have the most chill work environment ever conceived?
It's production, mostly "scratch" storage where old files get purged nightly. This thing has maybe 2 months of production left before it becomes a testbed. Also yes!

Zephirus posted:

Our DCS9900s (same thing, IBM model number) went through a period of failing large numbers of perfectly good drives because someone at DDN screwed up the error counters in the firmware. Two of the disk modules would not turn off their alert lights despite being completely fine. They constantly neeeded nursing to keep online DDN couldn't figure out why. Eventually the whole lot got replaced because the continuous read/write performance was rubbish.
That's a new one! We started with a pair of controllers and 5 disk enclosures, then expanded to 20 disk enclosures. Right after that expansion we were down for maybe 2 weeks because it would fail hundreds of drives (even original ones with data), and couldn't see a good portion of the rest. DDN did firmware upgrades, replaced a disk enclosure, I/O modules in other disk controllers, SATA bridge chips, and 58 drives until all of a sudden it started working. I've been happy with the performance though.

The_Groove
Mar 15, 2003

Supersonic compressible convection in the sun
Haha fair enough. Other missing details: we've never lost any files/data on this system. The ratio of how many drives the controllers have failed vs. how many ARE failed is HUGE. HPC environments and equipment are like the wild west.

This is all why I was so excited that replacing both controllers worked like it should. No running undocumented commands from DDN over the phone!

The_Groove
Mar 15, 2003

Supersonic compressible convection in the sun

Misogynist posted:

I'm curious, do any of you other guys in the tiering discussion work in an environment with petabytes-range of raw data? SSD caching is great for applications with repeatable hotspots, but it's going to be a long time before it can play with the HPC kids.

I manage about 10PB of storage (4560 disks), if I had a ton of free SSDs lying around I'd use them as metadata-only disks. Currently the metadata is on the same LUNs as actual data, so under heavy loads the "interactive user experience" goes down. HPC storage is usually all about large-block sequential bandwidth, capacity, and saving as much money as possible so it can be spent on the computational aspects of the system.

The_Groove
Mar 15, 2003

Supersonic compressible convection in the sun
It may be unique to the GPFS filesystem, but when defining a new "disk" you can specify if you want it to have data only, metadata only, or both. It's basically normal filesystem metadata (inodes) that GPFS keeps on whichever disks you allow it to. So if you scan/stat all your files it shouldn't touch the data-only disks at all unless you do a read/write. The latest version of GPFS can even store tiny files in the inode, which I'd assume would still reside on the metadata-only disks.

The_Groove
Mar 15, 2003

Supersonic compressible convection in the sun
That sounds fairly similar to a data analysis/visualization environment I used to manage, 12 clients with a ton of memory, 10GbE, and GPUs, so virtualization didn't make sense. We ran linux though, and ended up running IBM's GPFS filesystem with a 2MB block size on large DDN9550/DDN9900 storage systems (about 1.5PB total) with 8 NSD (I/O) servers in front of it, serving everything out over 10G. A single client could max out it's 10G card when doing sequential reads/writes and the 9900 alone could hit 4-5 GB/sec peaks under a heavy load. Granted GPFS is not even close to free and probably pretty expensive for a relatively "small" installation like that. It's more geared towards huge clusters and HPC, but drat did it rock for that environment.

I'm not saying a different filesystem or anything will solve your issues. I just wanted to give a description of a similar environment where disk I/O was pretty sweet.

The_Groove
Mar 15, 2003

Supersonic compressible convection in the sun
Yeah, IBM's storage/filesystem offerings are starting to move in that direction also. No more hardware RAID controllers, just have your NSDs (I/O servers) and filesystem layer handle everything. They use a "declustered" RAID to reduce rebuild times and lower the performance degradation during rebuilds. Here's a video if anyone's interested: https://www.youtube.com/watch?v=VvIgjVYPc_U

The_Groove
Mar 15, 2003

Supersonic compressible convection in the sun

Misogynist posted:

Is IBM still going to be selling engineered systems around GPFS? I was under the impression SONAS was going over to Lenovo.
I don't know anything about SONAS, but GPFS is still an official IBM product so I want to say yes. Their current HPC "storage appliance" offerings are called GSS (GPFS Storage Server?) and use the declustered RAID stuff I mentioned earlier on (now Lenovo's) x3650 servers. We've sent IBM a list of questions about how the Lenovo deal will affect our support and future products, but no response yet.

The_Groove
Mar 15, 2003

Supersonic compressible convection in the sun

parid posted:

Looks like I might be getting into the HPC business soon. I have mostly been a NetApp admin so far. Any recommendations on what technologies/architectures to start learning about? I hear a lot about paralyzed file systems.
I work in HPC, I don't know if you'll still be in storage, but I'd get a little familiar with the main HPC/storage things I guess. Parallel filesystems, infiniband, SAS/fiber channel, cluster management things (xcat, etc.), MPI, job schedulers, your favorite monitoring/alerts framework, etc. HPC is generally IBM, Cray, SGI, and Dell's world, excluding some smaller integrators. So if you have an inkling of what type of system you have or will have you can start research on some of their specific offerings.

Our storage is 76 Netapp E5400's (IBM dcs3700), so that part may be familiar!

A "paralyzed" filesystem is an issue we see a lot, usually caused by some user job triggering the OOM-killer on a node (or hundreds of nodes). It's really the filesystem being "delayed for recovery" while GPFS tries to figure out what happened to the node and what to do with the open files and all these tokens that got orphaned. It's not a very fast process and can result in things like a specific file, directory, or entire filesystem being "hung" until the recovery finishes.

The_Groove
Mar 15, 2003

Supersonic compressible convection in the sun

parid posted:

What kind of interconnects do your see? Between processing nodes and storage? Sounds like a lot of block-level stuff. Is that just due to performance drivers?
The ones I've worked with are 10GbE and "FDR-14" Infiniband @ 56Gbps. In the 10GbE setup the storage nodes (gpfs NSDs) had dual bonded 10G interfaces, but the clients were nodes in a power6 cluster that had weird routing and routed through 4 "I/O" nodes to get to the storage nodes. With infiniband we have a full fat-tree topology for the compute cluster and an extra switch hanging off the side where the storage nodes are connected. GPFS itself uses TCP/IP over 10GbE and both TCP/IP and RDMA over Infiniband.

I came across a pretty good Parallel I/O presentation from a few years back that gives a decent overview of HPC storage: http://www.nersc.gov/assets/Training/pio-in-practice-sc12.pdf

The_Groove
Mar 15, 2003

Supersonic compressible convection in the sun
You can put pretty much any hardware behind GPFS as long as it can see it as a block device. I've even made little clusters in virtualbox serving up those virtual "disks" and it all works (slowly). You can put metadata on SSDs or whatever and spread it out among all server nodes and do enough tuning to make it a lot better for small and random I/O patterns, but yeah, GPFS is mostly used in places with a requirement for large-block sequential bandwidth. It can pretty much max out whatever storage is behind it.

I'm guessing licensing costs are going to be THE most prohibitive factor in using GPFS in a cheap/small environment. The test system for our big cluster is 1 manager node, 2 server nodes, and a single Netapp E5400, which is about as small as you could make a cluster while still having some amount of flexibility for taking down a node and having the other take over everything. You could make a single node "cluster" if you wanted, with all the caveats you could imagine. GPFS is also waaaaaay overkill for a backup target, but would totally work.

The_Groove
Mar 15, 2003

Supersonic compressible convection in the sun

goobernoodles posted:

Don't get me started on IBM and their firmware updates.
We run firmware on a ton of netapps that's approved/tested by IBM's cluster team, but is generally old enough that their website for firmware downloads has aged off the version we need by the time it's been approved.

The_Groove
Mar 15, 2003

Supersonic compressible convection in the sun

Rhymenoserous posted:

IBM has been slowly circling the drain for the last 10 years, I can't for the life of me understand why anyone would buy their loving products anymore.
"No one's ever been fired for buying IBM." (although they really should)

One funny thing about the Lenovo x86 deal. IBM sells (sold) supercomputers to government organizations, most of these are not allowed to communicate with Lenovo or Lenovo employees, being a Chinese company. They sold a system to NOAA a few years back, and had a contract to update the system this year. Since NOAA can't deal with Lenovo, IBM is using Cray (their largest competitor in the supercomputing market) as a subcontractor for this new NOAA system.

For our support, we've been told that if/when IBM says "OK I need to send this info to a Lenovo employee to figure out what's going on" that we have to say "NO, you can't". It's at the point now where we have an IBM employee recite logs and stuff (minus "sensititve" info like, ip addresses???) to a Lenovo employee for support calls. The majority of the former IBM team that knew anything at all about our system were moved to Lenovo and we can no longer directly communicate with them.

The_Groove
Mar 15, 2003

Supersonic compressible convection in the sun
So a drive failed but the vendor replaced the wrong one...

RAID6 just saved 4PB of data, thanks RAID6!

Adbot
ADBOT LOVES YOU

The_Groove
Mar 15, 2003

Supersonic compressible convection in the sun
the hardware is, but the vendor is everyone's favorite three-letter acronym

  • 1
  • 2
  • 3
  • 4
  • 5
  • Post
  • Reply