Register a SA Forums Account here!
JOINING THE SA FORUMS WILL REMOVE THIS BIG AD, THE ANNOYING UNDERLINED ADS, AND STUPID INTERSTITIAL ADS!!!

You can: log in, read the tech support FAQ, or request your lost password. This dumb message (and those ads) will appear on every screen until you register! Get rid of this crap by registering your own SA Forums Account and joining roughly 150,000 Goons, for the one-time price of $9.95! We charge money because it costs us money per month for bills, and since we don't believe in showing ads to our users, we try to make the money back through forum registrations.
 
  • Post
  • Reply
adorai
Nov 2, 2002

10/27/04 Never forget
Grimey Drawer

conntrack posted:

If you keep three months of one snap each day writing one block will result in 90 writes per block? Im sure that database will be fast for updates.

Edit: i guess that depends on how smart the software is. clasical snaps would turn to poo poo.
it writes one time. The original blocks are locked and when they are written to, instead a new location is written to. The orignals are then only referenced by the oldest snapshot. Subsequent snapshots lock their blocks, and so forth. New data is written only once, and there is no performance penalty.

edit: I didn't see that we had rolled to a new page.

Adbot
ADBOT LOVES YOU

paperchaseguy
Feb 21, 2002

THEY'RE GONNA SAY NO

Mausi posted:

Something like a NetApp will maintain a hash table of where every logical block is physically located for any given level of the snapshot. True copy-on-write will simply allocate an unwritten block, write the data, and then change the hash table to point to the new data. Any reads will also check the hash table.
Very minor performance overhead, brilliant for all sorts of things.

XIV does this, but at the block level. XIV snapshots are incredibly easy to work with.

Vulture Culture
Jul 14, 2003

I was never enjoying it. I only eat it for the nutrients.

paperchaseguy posted:

XIV does this, but at the block level. XIV snapshots are incredibly easy to work with.
Erm, doesn't any block storage device by definition do this at the block level?

shablamoid
Feb 28, 2006
Shuh-blam-oid
Are there any recommended practices for performing a defrag on large volume systems? I have heard that backing up the data, then restoring it is a method of doing it, but this seems cumbersome

adorai
Nov 2, 2002

10/27/04 Never forget
Grimey Drawer

shablamoid posted:

Are there any recommended practices for performing a defrag on large volume systems? I have heard that backing up the data, then restoring it is a method of doing it, but this seems cumbersome
there is likely zero benefit to defragging a san.

Vulture Culture
Jul 14, 2003

I was never enjoying it. I only eat it for the nutrients.

adorai posted:

there is likely zero benefit to defragging a san.
To expand on this, your typical storage system has so many layers of indirection built into it already that it really barely matters whether your data is contiguous or not. It affects prefetching for synchronous I/O profiles and basically nothing else.

shablamoid
Feb 28, 2006
Shuh-blam-oid

Misogynist posted:

To expand on this, your typical storage system has so many layers of indirection built into it already that it really barely matters whether your data is contiguous or not. It affects prefetching for synchronous I/O profiles and basically nothing else.

Would it make a difference if the system is a Windows 2008 R2 box with 2 MD1000's attached or would this generally apply to all systems?

Syano
Jul 13, 2005

shablamoid posted:

Would it make a difference if the system is a Windows 2008 R2 box with 2 MD1000's attached or would this generally apply to all systems?

I think what they are getting at is it does not matter what the hardware is. It matters what the workload is. What workload are you running on that box connected to those 2 MD1000s?

shablamoid
Feb 28, 2006
Shuh-blam-oid
They have 10 VMs setup on the root of the server, one of which is a medium to heavy load SQL server. They also have all of their users (100~) with roaming profiles and a couple of users who use GIS all day, which makes up the bulk of the data.

conntrack
Aug 8, 2003

by angerbeet
Being aligned is more important.

Databases allocated on a fresh ntfs filesystem will never benefit from a filesystem level defrag as the intelligence about dataplacement is in the database.

Trancient files are likeley to be created/deleted before the defrag even runs.

Perhaps if you do something retarded like mixing loads in one partition or single drive luns it might be worth the effort to defrag?

shablamoid
Feb 28, 2006
Shuh-blam-oid

conntrack posted:

Being aligned is more important.

Databases allocated on a fresh ntfs filesystem will never benefit from a filesystem level defrag as the intelligence about dataplacement is in the database.

Trancient files are likeley to be created/deleted before the defrag even runs.

Perhaps if you do something retarded like mixing loads in one partition or single drive luns it might be worth the effort to defrag?

Nope, nothing like that. Excellent, thank you for the info.

Nomex
Jul 17, 2002

Flame retarded.

shablamoid posted:

They have 10 VMs setup on the root of the server, one of which is a medium to heavy load SQL server. They also have all of their users (100~) with roaming profiles and a couple of users who use GIS all day, which makes up the bulk of the data.

Run a defrag task in each VM.

complex
Sep 16, 2003

If you have a single VM in a datastore, you may gain from defragmenting your guest. See http://vpivot.com/2010/04/14/windows-guest-defragmentation-take-two/ for some hard data.

However, as you add the I/O of multiple VMs to a datastore, you are effectively turning the I/O stream into random I/O, from the array's point of view. Any gains previously made will be muted.

complex fucked around with this message at 02:53 on Sep 4, 2010

paperchaseguy
Feb 21, 2002

THEY'RE GONNA SAY NO

Misogynist posted:

Erm, doesn't any block storage device by definition do this at the block level?

XIV does redirect on write. It's taking a snapshot not of the LUN blocks, but of the LUN's block pointers. Most block storage does copy-on-first-write, which creates much more load.

Mausi
Apr 11, 2006

paperchaseguy posted:

XIV does redirect on write. It's taking a snapshot not of the LUN blocks, but of the LUN's block pointers. Most block storage does copy-on-first-write, which creates much more load.
Your statement about 'most block storage' is out of date - this is precisely how every SAN I currently work with operates and what I meant by the hash table (hash / pointer terminology differences)

Tsaven Nava
Dec 31, 2008

by elpintogrande
This is really small potatoes for this thread, but I think it's the best place to ask it.

I've got a small network of about 60 users with a mix of workstations and laptops, along with two (soon to be three) Windows servers. Most of the users are using IBM's Trivoli backup software to back up their data onto our file server, but I'm still trying to decide on how to back up the three servers.

In total, I'd like 3tb of total capacity to backup the three Windows servers. Likely I won't hardly use half that, but I'd like to plan for the future. Cost is a VERY important factor, and I don't even know where to start looking for the most reasonably priced solution that I'm not going to have to put much effort into.

What can you guys recommend? Tape? NAS? Pocketful of SD cards?

Mausi
Apr 11, 2006

Do you want a backup somewhere that you overwrite each week or so, or do you want to do things properly and keep a rolling set of backups so you can go back a day, week or month as required?
Do you want to take it off site (presumably yes) and how much speed is required?

Basically, depending on what you mean by 'backup' depends on what solution you should best look at.

Nomex
Jul 17, 2002

Flame retarded.
If you can, convince your company to, go with a disk based backup solution. Something like a small Datadomain or HP D2D. Backup and rebuild speeds are way faster than tape, which means if you do have to rebuild a server, it'll be down for a lot less time. Later on if the company wants, you can get a second unit off site and replicate the data between them. Tape will be cheaper, but slower and less reliable.

conntrack
Aug 8, 2003

by angerbeet
Like the previous speaker said, depends on what you want. An archive or disaster recovery. If users depend on being able to recover each and every deleted gif of funny dogs 6 months from now tape might be better as it's easy to buy more tape.

Tapes are also portable.

A server might be tempting but won't be worth much if a power surge burns out the backup server at the same time as the production servers because they were all in the same building.

Edit: i focused on cost. If you have money you would of course get two diskbased systems with off site replication. Pricey but worth it.

conntrack fucked around with this message at 22:09 on Sep 8, 2010

Tsaven Nava
Dec 31, 2008

by elpintogrande

Nomex posted:

If you can, convince your company to, go with a disk based backup solution. Something like a small Datadomain or HP D2D. Backup and rebuild speeds are way faster than tape, which means if you do have to rebuild a server, it'll be down for a lot less time. Later on if the company wants, you can get a second unit off site and replicate the data between them. Tape will be cheaper, but slower and less reliable.

*checks prices* OH GOD.

It looks awesome, but is so far out of my budget that I need the Keck telescope to see it. I've probably got $1500 maximum to play with in this situation, so my "backup solution" might end up being some USB drives. Crap.

I'm still in the planning stages for this, but it's part of replacing a bunch of servers and upgrading most of the network. I'm focusing on disaster recovery, not archiving.

My plan is to have three servers: A single server dedicated to our Sage Accpac software, a file server/DC/DNS, and a DC/DNS/DHCP/WSUS/Alphabet Soup system. User data is backed up onto the file server and consumes the largest amount of space, but given that it is the backup of their workstations, I'm not as concerned about backing it up again.

My goal is to be able to recover from a catastrophic server failure quickly, with minimized downtime or loss of data. It has to be reliable, easy to manage and use. And it needs to be cheap, to fit within a budget that doesn't exist. And I want a pony. That flies and shoots laser beams. (I figure as long as I'm asking for the impossible, I might as well go all out)

Vulture Culture
Jul 14, 2003

I was never enjoying it. I only eat it for the nutrients.
Would any of you IBM guys happen to be able to come up with a reason why every morning at 2:03 AM, my ESX servers flag a SCSI bus reset from our DS5100 that happens just long enough to gently caress up my Exchange cluster and bring every single mailbox server in the DAG offline?

Every once in awhile, but not all the time, this also is enough to upset replication to the storage head in question from our DS4800 on the same fabric.

I'm just about to place a case, but I figured I'd ask here first in case anyone knew offhand what might be special about 2 AM on these devices.

Vulture Culture fucked around with this message at 22:38 on Sep 8, 2010

H110Hawk
Dec 28, 2006

pelle posted:

Blue Arc:
Mercury 100
with 4 disk enclosures with 48x1 TB SAS 7200 RPM drives.

Had BlueArc gotten any better than the steaming pile of poo poo they were in the Titan 2000 days? Apparently our sales guy was super shady and eventually got fired. I met the new guy and he said most of his time is spent cleaning up the mess the old guy made. Adding a dose of reality to that, only half of our problems were likely caused by the sales guy, the other half was due to poor hardware.

Mausi
Apr 11, 2006

Tsaven Nava posted:

My goal is to be able to recover from a catastrophic server failure quickly, with minimized downtime or loss of data. It has to be reliable, easy to manage and use. And it needs to be cheap, to fit within a budget that doesn't exist.

Quick and dirty solution:
1) Get a small NAS device, cheap as chips, probably with huge SATA disks in it.
2) Clone the OS disks of your servers using VMware converter (or something similar) onto the NAS. If it's a domain controller do Systemstate as well just to be safe.
3) Backup your data disks using a bit of software capable of change-only updates onto the NAS
4) Take the NAS home with you, or put it somewhere else safe.

That's basic server DR. You can restore the actual servers as virtual machines on a single replacement box in a data centre or anywhere else. If you're talking real DR then you'll also need the workstation images if your users are allowed to save important data there (which they shouldn't, but it's your business not mine)

Tsaven Nava posted:

And I want a pony. That flies and shoots laser beams. (I figure as long as I'm asking for the impossible, I might as well go all out)
Get a WoW account.

Vulture Culture
Jul 14, 2003

I was never enjoying it. I only eat it for the nutrients.

H110Hawk posted:

Had BlueArc gotten any better than the steaming pile of poo poo they were in the Titan 2000 days? Apparently our sales guy was super shady and eventually got fired. I met the new guy and he said most of his time is spent cleaning up the mess the old guy made. Adding a dose of reality to that, only half of our problems were likely caused by the sales guy, the other half was due to poor hardware.
We've had remarkably few problems with our hardware (other than the ill-fated Xyratex SA-48 enclosures which have been replaced by LSI/DDN), but BlueArc's handling of firmware is really idiotic and not what I would expect of a large-scale software vendor.

It's not common for most vendors to resell kit, especially storage kit, from other vendors. Almost all of the major vendors resell something or other made by LSI. In most of these cases, the vendors buy and rebrand, and have access to the firmware code to investigate and fix bugs and forward their bugs and fixes upstream. BlueArc doesn't take responsibility for the products that they sell. Instead, they go "oh, it's an LSI problem" and sit on their hands for a few months while we have production issues with devices in a permanent error state. I don't know if the firmware licenses are really expensive and that's why their solutions are so cheap or if they just don't have anyone on staff familiar enough with the products they sell to actually maintain the firmware, but I definitely haven't been thrilled with the whole product support experience.

We used to have great experiences with their support, but our main support guy moved down to Philly and now we only hear from him when things go really, really wrong. We had someone last week come in to set up replication between our Titan cluster and the Mercury head at our DR site, and he somehow ended up knocking out one of our Titans used by half of our HPC cluster during production hours, presumably by having butterfingers with the FC cabling on the backend. He then denied that anything happened, which is perfectly plausible from a factual standpoint, but the fact that this thing just happened to go down when he was fiddling around behind the rack was a little bit too convenient a coincidence for my tastes.

My overall recommendation is that BlueArc is remarkably cost-effective if you need a reasonably inexpensive vendor for scalable storage because you have huge storage requirements (e.g. you're in multimedia production or life sciences and keep huge amounts of data online basically forever) but you should be absolutely aware of what you're getting into before you ink a contract, because they definitely do not have the reliability track record of a tier-1 storage vendor like EMC.

Klenath
Aug 12, 2005

Shigata ga nai.
Anybody have experiences to share about Panasas NAS gear?

My group is considering a purchase of 10 x 40TB shelves of their PAS-7 product to serve as a location for researchers to back up their data.

Intraveinous
Oct 2, 2001

Legion of Rainy-Day Buddhists

Cultural Imperial posted:

That's impressive! What's your rationale for RAID10? What's the database?

Rationale for RAID10 is the old SAME (Stripe And Mirror Everything) rule of thumb on Oracle DB storage, but since SSDs make many of the concerns moot, and I shouldn't have to worry too much about rebuild time on a 50-60GB SSD, I thought that RAID6 might be feasible. I'm still just a little bit afraid of RAID5. With all of the drives being installed at the same time, I'll have pretty similar write levels on all of them. With SSD, if I were to hit the write boundary causing multiple cells to fail at the same time on multiple drives, having the ability to lose 2 of them without array loss would be nice.

P400 controller however, is somewhat bad at RAID5/6 implementation, as in, it's much slower at offloading the parity calculation than say the P410, PERC/5, or PERC/6. Since the db is a 95/5 mix of reads/writes, I ended up deciding that I didn't care too terribly much and stayed with the P400.

Database is an index for an OLTP system running on Oracle 11gR2. Should be interesting, just waiting to find the next big bottleneck now that disk i/o won't be it.

Intraveinous
Oct 2, 2001

Legion of Rainy-Day Buddhists

Misogynist posted:

We've had remarkably few problems with our hardware (other than the ill-fated Xyratex SA-48 enclosures which have been replaced by LSI/DDN), but BlueArc's handling of firmware is really idiotic and not what I would expect of a large-scale software vendor.

Could you expand at all on your ill-fated Xyratex adventures? They seem to be the OEM for just about everything that's not LSI made or small/big enough to be made in house, but I keep hearing about problems with their stuff.

We have a couple of small arrays (24 spindle SAS, 4Gb FC backed) that were purchased by someone unawares of storage technology and have been nothing but problems. The name stamped on the outside of the Xyratex kit, Gateway, doesn't exist anymore since the demise of MPC.

Generally, if you try to put any kind of load, at all, whatsoever on them, the controllers will crash. Sometimes it's just a SCSI Bus reset, other times the controller locks up completely, but appears to be up to any multipathing software (with the exception of vSphere, which catches it), it just won't service any requests.

Xyratex hasn't been terribly willing to help me directly, saying they're just the OEM, and any service is the responsibility of the company that sells the box. That's fine with me, I understand that, and in fact, we found another company that resells the stuff that would sell us a basic service contract that covers dead drives and whatnot, but haven't gotten very far in troubleshooting the larger problems.

Vulture Culture
Jul 14, 2003

I was never enjoying it. I only eat it for the nutrients.

Intraveinous posted:

Could you expand at all on your ill-fated Xyratex adventures? They seem to be the OEM for just about everything that's not LSI made or small/big enough to be made in house, but I keep hearing about problems with their stuff.
We have constant, unending controller failures. When the controllers work, they perpetually spit errors back to the controllers managing them even though they're actually in a completely fine state. A few weeks ago, what pushed me over the edge was having to take over our Exchange environment for a little while and delete 500,000 messages that the Xyratex enclosure had politely emailed us over the weekend.

Vulture Culture
Jul 14, 2003

I was never enjoying it. I only eat it for the nutrients.
And hey, while I'm here, can someone explain to me how long-distance ISLs (~3km) are supposed to be configured on Brocade 5000/300 switches? Obviously I didn't set something up right on our longer pair of campus ISLs, because I started anal-retentively monitoring SNMP counters today and I'm noticing a crapload of swFCPortNoTxCredits on those ports.

Intraveinous
Oct 2, 2001

Legion of Rainy-Day Buddhists

Misogynist posted:

We have constant, unending controller failures. When the controllers work, they perpetually spit errors back to the controllers managing them even though they're actually in a completely fine state. A few weeks ago, what pushed me over the edge was having to take over our Exchange environment for a little while and delete 500,000 messages that the Xyratex enclosure had politely emailed us over the weekend.

Weird, you could have been explaining what I go through all the drat time (with the exception of the 500K emails). Mine's not nice enough to email me even if it has a legitimate failure. 1 more second with the thing would be too long. I thought that I could relegate it to a backup to disk target once I finally got all the production data off of it, but no, it can't even do that well. Apparently backups put too much strain on the drat thing.

conntrack
Aug 8, 2003

by angerbeet

Misogynist posted:

And hey, while I'm here, can someone explain to me how long-distance ISLs (~3km) are supposed to be configured on Brocade 5000/300 switches? Obviously I didn't set something up right on our longer pair of campus ISLs, because I started anal-retentively monitoring SNMP counters today and I'm noticing a crapload of swFCPortNoTxCredits on those ports.

I assume that means it needs more credits? try lowering the speed or see if you can donate credits from other ports?

BelDin
Jan 29, 2001
We just had our first major SAN failure this weekend, and the IT guys are paying for it. Long story short, we have a HP2312i with a total of three shelves hosting about 60VMs. One of the power supplies flaked out in a shelf and caused an error bringing down 4 of our drives in the main vDisk hosted across that shelf. We are using software to perform D-D-T VM backups, so you would think this would be a quick copy/snapshot fix across enclosures, right?

First thing they tried? Well, we cleared the metadata on ALL the degraded drives to make them restore themselves! It worked the last time a single drive failed like this! RAID6 :ughh:

I was told today that they hosted their production VMs and disk based backups on the same shelf and the same Vdisk (but different volumes!!!!!). Not even SAN snapshots to a Vdisk on another shelf. :ughh:

Luckily, our existing tapes will get most of our data back, but the domain was recreated from scratch (and all workstations manually rejoined) over the weekend because they do not perform regular AD backups. When I asked why they didn't even use the NTBackup utility with System State to export a file that they can keep, I swear I could see the Windows admin's eyes move into the :downs: position.

My real question: Has anyone had any other hardware/firmware issues with MSA2000 series enclosures? We've had nothing but problems with the equipment, and I'm just wondering if moving to a Clariion AX4-5i or similar would be worth the money while we can use the failure as justification. That, and the existing 1GB iSCSI throughput currently sucks across the 4VM hosts.

SiliconCow
Jul 14, 2001

BelDin posted:

Has anyone had any other hardware/firmware issues with MSA2000 series enclosures? We've had nothing but problems with the equipment, and I'm just wondering if moving to a Clariion AX4-5i or similar would be worth the money while we can use the failure as justification. That, and the existing 1GB iSCSI throughput currently sucks across the 4VM hosts.

Some experience with a msa2012... a lot of random firmware problems. Continually failing controllers and a web interface that sometimes doesn't like to create disks amongst other particulars.

adorai
Nov 2, 2002

10/27/04 Never forget
Grimey Drawer

BelDin posted:

We just had our first major SAN failure this weekend, and the IT guys are paying for it. ...
Holy poo poo. Your entire post is a tale of failure. Did your guys get any training on this SAN at all, or did the vendor just drop it off and say here you go? I hope they are hourly for their sake. Personally, I have heard pretty much all bad things about HP SANs. We didn't even look at them when we were purchasing.

sanchez
Feb 26, 2003

adorai posted:

Holy poo poo. Your entire post is a tale of failure. Did your guys get any training on this SAN at all, or did the vendor just drop it off and say here you go? I hope they are hourly for their sake. Personally, I have heard pretty much all bad things about HP SANs. We didn't even look at them when we were purchasing.

I've posted this before, but we had to apply the following update to our MSA recently. It sucks for other reasons as well.

"An issue exists on HP StorageWorks 2000 Modular Smart Array products running firmware versions J200P46, J210P22, or J300P22 that will eventually cause controller configuration information to be lost, with subsequent loss of management capability from that controller. Array management, event messaging, and logging will cease functioning, but host I/O will continue to operate normally. This issue affects the ability to manage the array from the affected controller only; if a partner controller is available, the array can be managed through the partner controller. Because configuration information is stored in non-volatile memory, resetting or powering off the controller will not clear this error. If the issue occurs, the controller must be replaced. This failure mode is time sensitive and HP recommends immediately upgrading firmware on all MSA2000 controllers. This is not a hardware issue and proactive replacement of a controller is not a solution. To avoid this condition, you must upgrade your controller to the latest version of firmware."

Vulture Culture
Jul 14, 2003

I was never enjoying it. I only eat it for the nutrients.

sanchez posted:

"An issue exists on HP StorageWorks 2000 Modular Smart Array products running firmware versions J200P46, J210P22, or J300P22 that will eventually cause controller configuration information to be lost, with subsequent loss of management capability from that controller. Array management, event messaging, and logging will cease functioning, but host I/O will continue to operate normally. This issue affects the ability to manage the array from the affected controller only; if a partner controller is available, the array can be managed through the partner controller. Because configuration information is stored in non-volatile memory, resetting or powering off the controller will not clear this error. If the issue occurs, the controller must be replaced. This failure mode is time sensitive and HP recommends immediately upgrading firmware on all MSA2000 controllers. This is not a hardware issue and proactive replacement of a controller is not a solution. To avoid this condition, you must upgrade your controller to the latest version of firmware."
This is even better than my IBM DS4000/5000 issue that caused all remote mirroring to be completely and silently broken on LUNs bigger than 2TB.

BelDin
Jan 29, 2001

sanchez posted:

I've posted this before, but we had to apply the following update to our MSA recently. It sucks for other reasons as well.

"An issue exists on HP StorageWorks 2000 Modular Smart Array products running firmware versions J200P46, J210P22, or J300P22 that will eventually cause controller configuration information to be lost, with subsequent loss of management capability from that controller. Array management, event messaging, and logging will cease functioning, but host I/O will continue to operate normally. This issue affects the ability to manage the array from the affected controller only; if a partner controller is available, the array can be managed through the partner controller. Because configuration information is stored in non-volatile memory, resetting or powering off the controller will not clear this error. If the issue occurs, the controller must be replaced. This failure mode is time sensitive and HP recommends immediately upgrading firmware on all MSA2000 controllers. This is not a hardware issue and proactive replacement of a controller is not a solution. To avoid this condition, you must upgrade your controller to the latest version of firmware."

Current storage controller code version? M110R28. Same issue, controller goes :psyboom:

BelDin
Jan 29, 2001

adorai posted:

Holy poo poo. Your entire post is a tale of failure. Did your guys get any training on this SAN at all, or did the vendor just drop it off and say here you go? I hope they are hourly for their sake. Personally, I have heard pretty much all bad things about HP SANs. We didn't even look at them when we were purchasing.

We were at the tail end of our 5 year contract, and a new company won the next one. During the last 6 months, we could buy whatever we needed, but no training would be approved because they did not want to pay for training that would benefit the new company. They are hourly, I am not (Cyber Security Manager). The only reason I am working this is that I'm filling in for our sole network administrator and am the only one with prior SAN experience (HP EVA 4400).

So option #2.

Syano
Jul 13, 2005
That post scares the hell out of me. We are virtualizing our entire network at the moment all on top of a Dell MD3200i. Now when I say our entire network I am really just talking 15ish servers, give or take. But hearing a story of a SAN go tits up even with 'redundant' options gives me nightmares.

Adbot
ADBOT LOVES YOU

H110Hawk
Dec 28, 2006
Redundant is another word for money spent to watch multiple things fail simultaneously.

  • 1
  • 2
  • 3
  • 4
  • 5
  • Post
  • Reply