|
It really makes you appreciate storage hardware when 120 drives fail due to a bad SAS card/cable and you don't see a performance hit while rebuilding 32 at once. That and RAID6 totally saving your rear end from losing data. Close call this week...
|
# ¿ Feb 17, 2012 20:28 |
|
|
# ¿ Apr 27, 2024 21:24 |
|
Not really. I guess I forgot to say that they were all manually failed in order to power down a pair of enclosures to swap I/O modules and cables to try and find out what actually went bad. But yeah, if one path from one controller going down causes drive failures, there are bigger problems.
|
# ¿ Feb 17, 2012 21:45 |
|
cheese-cube posted:On the suject of IBM midrange storage systems has anyone had a look at their new DCS3700 and DCS9900 high-density systems yet? 60 HDDs in 4U is pretty drat crazy. Also you can get the DCS9900 with either 8 x FC 8Gb or 4 x InfiniBand DDR host interfaces which is insane. Getting 76 DCS3700's up and running pretty soon, don't know too much about them just yet.
|
# ¿ Jul 24, 2012 21:09 |
|
One pair of controllers, but with 2 drive enclosures (daisy-chained) for each channel. So 1200 drives total.
|
# ¿ Jul 24, 2012 21:15 |
|
szlevi posted:I used to run our entire production load on a single 8550, there's nothing wrong with them, I think. It's just a big, dumb RAID storage piece, there some options - strictly console-only mgmt - but basically it was a giant fast array (~20TB) and that's it. No features, not even snapshots, nothing.
|
# ¿ Jul 31, 2012 20:21 |
|
Every time I have DDN gear fail, it just makes me more impressed by it. I had BOTH raid controllers for a DDN9900 storage system fail after being powered on following building maintenance. A "disk chip" in each controller had failed. DDN shipped out 2 replacements, I swapped them in, re-cabled everything, and they booted fine, reading their config (zoning, network, syslog forwarding, etc.) off one of the disks. I didn't have to do anything other than turn them on!
|
# ¿ Oct 24, 2012 18:48 |
|
Haha well poo poo happens, I wouldn't necessarily blame DDN for components failing in an unlikely combination resulting in unscheduled downtime. They offered to send a tech, but we decided to do the replacement ourselves since their procedure was so simple. I was pretty impressed but apparently this is nothing special!
|
# ¿ Oct 24, 2012 21:10 |
|
Yeah we did the work because it would be finished before a tech could get out here. It is annoying that while the 2 controllers had different "disk chips" fail, between the two they could have still seen all the disks. But, the failed disk chip on the A controller prevented access to the disk holding the configuration (it's not some internal disk in the controllers, it uses a disk in the arrays it couldn't talk to). It's too late now, but I wonder if swapping the A/B controllers would have worked temporarily, since the new A controller should have been able to read the config. I have heard some stories about these 9900 controllers having serious issues during bootup, usually firmware related and not multiple failures though, but maybe that's changing as they age.
|
# ¿ Oct 24, 2012 21:59 |
|
Corvettefisher posted:Was this on a non production box, or do you just have the most chill work environment ever conceived? Zephirus posted:Our DCS9900s (same thing, IBM model number) went through a period of failing large numbers of perfectly good drives because someone at DDN screwed up the error counters in the firmware. Two of the disk modules would not turn off their alert lights despite being completely fine. They constantly neeeded nursing to keep online DDN couldn't figure out why. Eventually the whole lot got replaced because the continuous read/write performance was rubbish.
|
# ¿ Oct 25, 2012 01:51 |
|
Haha fair enough. Other missing details: we've never lost any files/data on this system. The ratio of how many drives the controllers have failed vs. how many ARE failed is HUGE. HPC environments and equipment are like the wild west. This is all why I was so excited that replacing both controllers worked like it should. No running undocumented commands from DDN over the phone!
|
# ¿ Oct 25, 2012 02:16 |
|
Misogynist posted:I'm curious, do any of you other guys in the tiering discussion work in an environment with petabytes-range of raw data? SSD caching is great for applications with repeatable hotspots, but it's going to be a long time before it can play with the HPC kids. I manage about 10PB of storage (4560 disks), if I had a ton of free SSDs lying around I'd use them as metadata-only disks. Currently the metadata is on the same LUNs as actual data, so under heavy loads the "interactive user experience" goes down. HPC storage is usually all about large-block sequential bandwidth, capacity, and saving as much money as possible so it can be spent on the computational aspects of the system.
|
# ¿ Apr 18, 2013 19:56 |
|
It may be unique to the GPFS filesystem, but when defining a new "disk" you can specify if you want it to have data only, metadata only, or both. It's basically normal filesystem metadata (inodes) that GPFS keeps on whichever disks you allow it to. So if you scan/stat all your files it shouldn't touch the data-only disks at all unless you do a read/write. The latest version of GPFS can even store tiny files in the inode, which I'd assume would still reside on the metadata-only disks.
|
# ¿ Apr 18, 2013 20:37 |
|
That sounds fairly similar to a data analysis/visualization environment I used to manage, 12 clients with a ton of memory, 10GbE, and GPUs, so virtualization didn't make sense. We ran linux though, and ended up running IBM's GPFS filesystem with a 2MB block size on large DDN9550/DDN9900 storage systems (about 1.5PB total) with 8 NSD (I/O) servers in front of it, serving everything out over 10G. A single client could max out it's 10G card when doing sequential reads/writes and the 9900 alone could hit 4-5 GB/sec peaks under a heavy load. Granted GPFS is not even close to free and probably pretty expensive for a relatively "small" installation like that. It's more geared towards huge clusters and HPC, but drat did it rock for that environment. I'm not saying a different filesystem or anything will solve your issues. I just wanted to give a description of a similar environment where disk I/O was pretty sweet.
|
# ¿ Dec 12, 2013 00:23 |
|
Yeah, IBM's storage/filesystem offerings are starting to move in that direction also. No more hardware RAID controllers, just have your NSDs (I/O servers) and filesystem layer handle everything. They use a "declustered" RAID to reduce rebuild times and lower the performance degradation during rebuilds. Here's a video if anyone's interested: https://www.youtube.com/watch?v=VvIgjVYPc_U
|
# ¿ Feb 20, 2014 00:19 |
|
Misogynist posted:Is IBM still going to be selling engineered systems around GPFS? I was under the impression SONAS was going over to Lenovo.
|
# ¿ Feb 20, 2014 18:50 |
|
parid posted:Looks like I might be getting into the HPC business soon. I have mostly been a NetApp admin so far. Any recommendations on what technologies/architectures to start learning about? I hear a lot about paralyzed file systems. Our storage is 76 Netapp E5400's (IBM dcs3700), so that part may be familiar! A "paralyzed" filesystem is an issue we see a lot, usually caused by some user job triggering the OOM-killer on a node (or hundreds of nodes). It's really the filesystem being "delayed for recovery" while GPFS tries to figure out what happened to the node and what to do with the open files and all these tokens that got orphaned. It's not a very fast process and can result in things like a specific file, directory, or entire filesystem being "hung" until the recovery finishes.
|
# ¿ Apr 28, 2014 19:42 |
|
parid posted:What kind of interconnects do your see? Between processing nodes and storage? Sounds like a lot of block-level stuff. Is that just due to performance drivers? I came across a pretty good Parallel I/O presentation from a few years back that gives a decent overview of HPC storage: http://www.nersc.gov/assets/Training/pio-in-practice-sc12.pdf
|
# ¿ Apr 29, 2014 16:33 |
|
You can put pretty much any hardware behind GPFS as long as it can see it as a block device. I've even made little clusters in virtualbox serving up those virtual "disks" and it all works (slowly). You can put metadata on SSDs or whatever and spread it out among all server nodes and do enough tuning to make it a lot better for small and random I/O patterns, but yeah, GPFS is mostly used in places with a requirement for large-block sequential bandwidth. It can pretty much max out whatever storage is behind it. I'm guessing licensing costs are going to be THE most prohibitive factor in using GPFS in a cheap/small environment. The test system for our big cluster is 1 manager node, 2 server nodes, and a single Netapp E5400, which is about as small as you could make a cluster while still having some amount of flexibility for taking down a node and having the other take over everything. You could make a single node "cluster" if you wanted, with all the caveats you could imagine. GPFS is also waaaaaay overkill for a backup target, but would totally work.
|
# ¿ May 23, 2014 17:31 |
|
goobernoodles posted:Don't get me started on IBM and their firmware updates.
|
# ¿ Jul 29, 2014 20:26 |
|
Rhymenoserous posted:IBM has been slowly circling the drain for the last 10 years, I can't for the life of me understand why anyone would buy their loving products anymore. One funny thing about the Lenovo x86 deal. IBM sells (sold) supercomputers to government organizations, most of these are not allowed to communicate with Lenovo or Lenovo employees, being a Chinese company. They sold a system to NOAA a few years back, and had a contract to update the system this year. Since NOAA can't deal with Lenovo, IBM is using Cray (their largest competitor in the supercomputing market) as a subcontractor for this new NOAA system. For our support, we've been told that if/when IBM says "OK I need to send this info to a Lenovo employee to figure out what's going on" that we have to say "NO, you can't". It's at the point now where we have an IBM employee recite logs and stuff (minus "sensititve" info like, ip addresses???) to a Lenovo employee for support calls. The majority of the former IBM team that knew anything at all about our system were moved to Lenovo and we can no longer directly communicate with them.
|
# ¿ Feb 2, 2015 19:45 |
|
So a drive failed but the vendor replaced the wrong one... RAID6 just saved 4PB of data, thanks RAID6!
|
# ¿ Jun 3, 2015 23:31 |
|
|
# ¿ Apr 27, 2024 21:24 |
|
the hardware is, but the vendor is everyone's favorite three-letter acronym
|
# ¿ Jun 3, 2015 23:51 |