|
Dilbert As gently caress posted:Isn't SmartOS just a limited VAAI and Vm-open-tools ready appliance? See adorai's response. SmartOS is closer to ESXi than anything. It is absolutely not an appliance in the sense that it's designed to be a guest. It's a host. It's designed to be a relatively stateless hypervisor based on ZFS, zones, and KVM (intel-only, I think) ported to the IllumOS kernel. It's an anonymous operating system which CoreOS is probably molded off of which shouldn't run any real services itself, but solely should be used for hosting guests.
|
# ? Jun 12, 2014 19:04 |
|
|
# ? Apr 26, 2024 15:24 |
|
mattisacomputer posted:What makes you say that? Oh... Hahaha, sorry dude. Yes, never, ever, ever update to the latest IBM code until it cooks for a few months. Are you using real time compression? In other news, I just configured my first Flash 820 and presented to the SVC. Next I get to set up a Vdisk mirror and see how this sumbitch performs with Oracle.
|
# ? Jun 12, 2014 20:09 |
|
Kaddish posted:Hahaha, sorry dude. Yes, never, ever, ever update to the latest IBM code until it cooks for a few months. Are you using real time compression? Hah noo I downloaded it out of habit but didn't plan on installing it for a few weeks. I'm still on 7.2.0.6 across the board.
|
# ? Jun 12, 2014 20:27 |
|
Hi. Does anybody have any article that I can show a CIO so he believes me when I say hybrid SSD/HDD SAN is going to be just as reliable as a just-HDD SAN? he says SSD more prone to failure.
|
# ? Jun 13, 2014 18:38 |
|
Full clone virtual desktops are not that abnormal, and even people like Brian Madden recommend them.
|
# ? Jun 13, 2014 18:49 |
|
NevergirlsOFFICIAL posted:Hi. Does anybody have any article that I can show a CIO so he believes me when I say hybrid SSD/HDD SAN is going to be just as reliable as a just-HDD SAN? he says SSD more prone to failure. Which vendor are you looking at? Reliability is a property of a system overall, so no one can really answer this question in general. But in many cases (eg Nimble) the SSD is just used as write through read cache, so even if you lose the SSD pool you don't lose data because writes always commit to HDD. And if the data is important then it will be protected with parity of some sort. SSD also isn't more prone to failure generally. SSDs are guaranteed to fail within a certain time frame at a certain write level, but that time frame is going to be many years on modern devices. The lack of mechanical parts also make them more resilient to the sorts of things that can catastrophically kill an HDD.
|
# ? Jun 13, 2014 18:52 |
|
NippleFloss posted:Which vendor are you looking at? Reliability is a property of a system overall, so no one can really answer this question in general. But in many cases (eg Nimble) the SSD is just used as write through read cache, so even if you lose the SSD pool you don't lose data because writes always commit to HDD. And if the data is important then it will be protected with parity of some sort. yeah we're looking at nimble... what you're saying is basically what I know. I want to show him an article about it though.
|
# ? Jun 13, 2014 20:03 |
|
NevergirlsOFFICIAL posted:yeah we're looking at nimble... what you're saying is basically what I know. I want to show him an article about it though. I don't think anyone has written that article because you'd have to be an idiot to not understand it. What does he think happens when an SSD fails in a Nimble array? What is his concern? Data loss, service outage, poor performance?
|
# ? Jun 13, 2014 20:19 |
|
NippleFloss posted:Which vendor are you looking at? Reliability is a property of a system overall, so no one can really answer this question in general. But in many cases (eg Nimble) the SSD is just used as write through read cache, so even if you lose the SSD pool you don't lose data because writes always commit to HDD. And if the data is important then it will be protected with parity of some sort. Like you said, this is a big "it depends," because complicated engineered systems differ wildly from one another. If you were to run an IBM SONAS or some other appliance where the metadata can be stored separately from the filesystem data, it would certainly go on SSD for performance. In these cases, your SSD is not just important, it's the most critical layer of your system.
|
# ? Jun 13, 2014 20:21 |
|
Misogynist posted:I'm fairly sure you didn't mean to say "write-through read cache" because that combination of words is a total Typed read cache, then changed it to write through cache, but not well enough. Phone posting. I'm actually not sure if Nimble cache is write through, or if it caches on first read, but either way the persistent store is HDD and losing an SSD just means the loss of some cache capacity.
|
# ? Jun 13, 2014 20:29 |
|
NippleFloss posted:I don't think anyone has written that article because you'd have to be an idiot to not understand it. I dunno, it seems like a perfect topic for the few remaining dead tree magazines that out-of-touch execs like this CIO read while on the can. But yeah, if you could specify what kind of failure he is worried about maybe we could dredge something up. "Reliability" in the abstract is so broad that it's meaningless to talk about.
|
# ? Jun 13, 2014 21:39 |
|
Can anyone link me to an article on whether black painted arrays fail more than white painted arrays? My dumb boss says white painted arrays fail more. Sorry, forums poster NevergirlsOFFICIAL, your boss is a big ole dummy.
|
# ? Jun 13, 2014 22:19 |
|
NevergirlsOFFICIAL posted:yeah we're looking at nimble... what you're saying is basically what I know. I want to show him an article about it though. Let me know if you have any questions about them. I am running a few of them here with a few expansion shelves.
|
# ? Jun 13, 2014 23:24 |
|
NippleFloss posted:Typed read cache, then changed it to write through cache, but not well enough. Phone posting. The SSDs in a Nimble array are read cache only, writes are coalesced in NVRAM into CASL stripes before being committed to disk.
|
# ? Jun 14, 2014 01:27 |
|
ragzilla posted:The SSDs in a Nimble array are read cache only, writes are coalesced in NVRAM into CASL stripes before being committed to disk. I know, my question was about whether the cache is populated when new data is written (write-through) or only on reads.
|
# ? Jun 14, 2014 01:50 |
|
zen death robot posted:I kind of forgot about this thread, but if anyone is ever planning on using block deduplication on 2nd gen VNX's be very very careful about doing it. I had a major issue last month where even when we followed all of the best practices laid out in the whitepapers we had some blocks deduplicate a little TOO well and it brought down the SPs and we had rolling outages on the drat array for a week. The only reason we were able to get ourselves out of the mess was because we had everything behind VPLEX and had enough disk space to migrate the LUNs out of the deduplicated space into standard thin luns. Well you are right, your setup is pretty dumb and was destined to fail. You had windows 500 machines targeting the same pool? Hell, were they all the same lun? Which VNX system are you guys using? Was your dedup settings set to max? Sounds like a system that was getting too greedy when disk is especially cheap these days. Block dedupe is a great tool and makes a lot of sense. From a san perspective its not a magic pill that is going to suddenly going to make your storage needs drop by 1000%. In all honesty it shines more with your nl-sas storage pools in which you are tiering your less active data to (which yeah, is more like an archiving like scenario you mentioned). Your most active data should be on your 15k and flash, but even then you need to split it up among pools to ensure your processing isn't going to be pushed. The netapp way of doing things and the EMC way is just different. You really shouldn't have expected your design to stay the same going from netapp to emc. I imagine that the flash%, the dedupe policies, the tiering (which i guess you probably aren't even using effectively, if at all). I have just gone through the same transition from netapp to emc but I had the ability to build from the ground up.
|
# ? Jun 14, 2014 04:48 |
|
NippleFloss posted:I know, my question was about whether the cache is populated when new data is written (write-through) or only on reads. It depends, if the data is considered 'hot' it will place the CASL stripe in flash as well as disk (according to public whitepapers, I haven't seen any metrics in Infosight which provide feedback on this).
|
# ? Jun 14, 2014 13:46 |
|
zen death robot posted:
Hey, although three more shelves of disks sounds nice keep an eye on your CPU utilisation. What is it at currently? If it's high already throwing more SSD's into the mix can drive it even higher as they're doing less waiting and more working. Good problem to have but it can mean two things: 1.) You drive it to 100% utilisation and basically have an array redlining for more than a burst 2.) The general CPU utlisation goes higher (50%+) and if you lose an SP the other can cope but not without apps suffering. Good gift to have but still something to watch, ther eis a sweet spot with SSD's and you can overdose!
|
# ? Jun 15, 2014 10:35 |
|
I'm strictly IBM with a few re-branded things here and there (NSeries, etc) and all the continual horror stories regarding EMC make me very glad of this fact.
|
# ? Jun 17, 2014 17:42 |
|
I'm fairly new to SANs and I've been charged with contacting vendors to implement a solution at the ISP I work for. We have an ageing infrastructure and no unified storage solution at all. Our mixed physical/virtual environment is mostly DNS, Web, MySQL, Mail and RADIUS on some old Dell 2950's and some generic 1u servers. The Radius, MySQL and Mail servers are fairly IO intensive (mostly writes) where everything else is pretty low on requirements and . We'll need about 12-24TB of storage to start and we'll want to do offsite replication for DR. Network speed between our sites will mostly be a non issue as we own the fibre network between them and we can easily support multiple 10 Gbe links. So far I've contacted EMC, NetApp and Dell to start the initial exploratory talks. When dealing with them, considering the above, what should I be expecting and is there anything I should watch out for? Also considering the above are there any other vendors I should investigate?
|
# ? Jun 17, 2014 19:17 |
|
do you need fibrechannel because if not look at nimble
|
# ? Jun 17, 2014 19:19 |
|
NevergirlsOFFICIAL posted:do you need fibrechannel because if not look at nimble We'll likely use iSCSI but FC is not out of the question at this point. I'll have to take a look at Nimble. How is their pricing compared to EMC, NetApp, Dell, etc. ?
|
# ? Jun 17, 2014 19:31 |
|
Kaddish posted:I'm strictly IBM with a few re-branded things here and there (NSeries, etc) and all the continual horror stories regarding EMC make me very glad of this fact.
|
# ? Jun 17, 2014 19:35 |
|
Misogynist posted:Ask me about SONAS replications that take fourteen weeks to finish
|
# ? Jun 17, 2014 19:41 |
|
bigmandan posted:We'll likely use iSCSI but FC is not out of the question at this point. I'll have to take a look at Nimble. How is their pricing compared to EMC, NetApp, Dell, etc. ? Often cheaper, but pricing is pretty maleable. If you let everyone know who they are competing against you can usually get the prices in the same ballpark.
|
# ? Jun 17, 2014 20:02 |
|
evil_bunnY posted:Ask me about v7k shelf errors that take arrays down for fun Ask me about IBM remote support sending out a CE to re-seat, and possibly replace one node in one of our V7Ks, which resulted in the system being offline for over 24 hours.
|
# ? Jun 17, 2014 20:12 |
|
Every vendor has good and bad stories. poo poo happens, it's no point saying 'wow I better avoid x' Just do the right thing - which is whatever is good for your CV
|
# ? Jun 17, 2014 20:13 |
|
evil_bunnY posted:Ask me about v7k shelf errors that take arrays down for fun mattisacomputer posted:Ask me about IBM remote support sending out a CE to re-seat, and possibly replace one node in one of our V7Ks, which resulted in the system being offline for over 24 hours. Vanilla posted:Every vendor has good and bad stories. poo poo happens, it's no point saying 'wow I better avoid x' Vulture Culture fucked around with this message at 20:24 on Jun 17, 2014 |
# ? Jun 17, 2014 20:20 |
|
Misogynist posted:It wouldn't even be a big deal if they didn't remove the contra-rotating cable configuration that the DS series had, which prevented this exact scenario from taking out every shelf down the chain
|
# ? Jun 17, 2014 20:24 |
|
evil_bunnY posted:No it was a software error on disk failure. Our colleagues had a grand old time.
|
# ? Jun 17, 2014 20:26 |
|
Misogynist posted:Oh, that's right, there was the time that we had a CE come out after a call-home to replace a disk that was showing up as failed in the V7000 GUI. He couldn't find the disk, and I couldn't find the disk, and there were no failure lights on the array. Finally, he left, and I counted up the disks in the GUI, and there was one more disk listed (the failed one, presumably) than actually existed in the array.
|
# ? Jun 17, 2014 20:28 |
|
zen death robot posted:Not everyone will run into this, but it's just a word of caution. I expect it'll be fixed in the next few months as well. Hell, I'm buying more VNXs for several other sites I manage as well. I think it's a great platform overall. I was just bit by this particular issue on a fairly new feature and the code that the VNX2 platform is running on is quite different from the Clarrion/Old-VNX platform so I'm not totally surprised something came along, really. Yep, from what I hear the VNX2 is pretty much all new software from the ground up so I would not be shocked to hear of a spike in issues for a while. Rewrite was long overdue anyway, mushrooming starting to creep in and then the widescale use of SSD's forced it quite a bit.
|
# ? Jun 17, 2014 20:30 |
|
Bitch Stewie posted:We're probably about to go for a HUS 110. I've got a HUS110, and yeah, basically everything Aquila said. Make sure you're getting the features you want/need and read all of the catches and caveats. Lots of poo poo is licensed separately and they don't make it simple. Mine's backing a 3-host VMware cluster, and on iSCSI the hosts would throw up latency warnings every 3 hours like clockwork. Nobody at VMware support, Hitachi, or the VAR could come up with *why* this happens. It's only in VMware's logs, and none of the VMs or applications seem to suffer when it happens. We switched to FC direct connected to the hosts and it just sorta went away... usually... I hate "fixing" poo poo like that. It does just work though, and was a good bit cheaper than the bids we got from NetApp and EMC.
|
# ? Jun 17, 2014 21:26 |
|
zen death robot posted:Currently on the 5600 it's around 15-20% (on both SPA and SPB) with dedupe off. That's during the busiest times, anyway. When we had dedupe on the engineers were watching it over ESRS where they could get a more granular look at things and saw that CORE 0 was pegging 100% and the other cores weren't all that busy. A big part of the fanfare about VNX2 was MCx which was supposed to make much better use of multiple processors, which makes it sort of funny that dedupe workload isn't spread across the CPUs.
|
# ? Jun 17, 2014 22:59 |
|
evil_bunnY posted:Ask me about v7k shelf errors that take arrays down for fun Same. Vanilla posted:Every vendor has good and bad stories. poo poo happens, it's no point saying 'wow I better avoid x' HP tape storage is the exception. It's all absolutely terrible.
|
# ? Jun 18, 2014 15:41 |
|
NullPtr4Lunch posted:I've got a HUS110, and yeah, basically everything Aquila said. Make sure you're getting the features you want/need and read all of the catches and caveats. Lots of poo poo is licensed separately and they don't make it simple. Thanks Still leading with the HUS 110. Hitachi seem deathly honest but it would be useful to know if you consider there to be any "must have" license options? We're planning on doing FC direct connect so other than tiering and the performance analyser license I don't see much else that jumps off the page as something we'd need? Incidentally do you have VAAI? I'm still a little hazy on how the zero reclaim works depending if you have it enabled or not (we're cheap scum so only have vSphere Standard licenses).
|
# ? Jun 18, 2014 17:43 |
|
NippleFloss posted:A big part of the fanfare about VNX2 was MCx which was supposed to make much better use of multiple processors, which makes it sort of funny that dedupe workload isn't spread across the CPUs. Multiple cores you mean. I think the main reason behind the new code was the fact that, like most old software, the old code was not optimised to use multiple cores generally. CPU's were never the bottleneck unless a system was really overloaded with disk..... ........but then SSD's came along and started to really gently caress such arrays because the CPU's would suddenly become the bottleneck where before they'd be happily chugging along. The new code should spread general workload across all the cores including rebuilds...... but features like dedup are ALWAYS going to be limited intentionally somehow. Sure it would be great if a certain feature did something faster but not at the cost of affecting day to day operation. I can get a shitload more SSD's into a VNX2 than a VNX and it's not all down to just having a slightly better CPU. Just wish they'd add more storage processors but then they'd be affecting their precious VMAX business.....
|
# ? Jun 18, 2014 23:05 |
|
Vanilla posted:Multiple cores you mean. I think the main reason behind the new code was the fact that, like most old software, the old code was not optimised to use multiple cores generally. CPU's were never the bottleneck unless a system was really overloaded with disk..... Oh, I definitely understand the rewrite. We've been going through the same thing at NetApp, parallelizing ONTAP. And you're correct that it's not trivial. IO, specifically write IO, is inherently difficult to parallelize because ordering often matters a great deal so you can't just spread work between multiple CPUs and hope it gets done in the correct order. Then when you get to things like calculating where to place data, and parity calculations, and metadata updates you can't have other processes updating the same sector of disk white another is working off of in-memory data to calculate the best way to do all of those things. The issue I have is that inline dedupe should be fairly easy to spread across multiple cores. You're checking data against the fingerprint database as soon as it arrives and deciding whether to write a pointer or a full allocation unit, then sending that along to the write allocator where multi-threading is trickier. Multiple processes checking and updating the fingerprint DB should be relatively easy to manage. The problem with saying that features like dedupe are always going to be limited is that inline dedupe sits in the middle of your IO path so you CAN'T handicap the rate at which it runs or it becomes unusable for everything but very low IO latency insensitive situations. It's fine to sacrifice dedupe efficiency to keep processing moving along at an acceptable rate , but it sounds like the current VNX2 code does the exact opposite, which is crippling. Edit: Also, EMC owns Data Domain which has some of the best inline dedupe IP in the world. I can't believe they wouldn't have made use of that. YOLOsubmarine fucked around with this message at 00:53 on Jun 19, 2014 |
# ? Jun 19, 2014 00:08 |
|
zen death robot posted:
This a thousand times. Without going into specifics, we got a massive upgrade due to sales mouths writing a check the engineering team couldn't cash. It always sucks moving through the pain but when the sales engineering team says you pay for a solution and when their solution fails and you aren't writing a check to bridge the gap, they aren't lying. As a company, it's got to suck for long term profitability, and I'd prefer it was right out of the gate but as a customer nothing shuts you up off a failure faster than when EMC 'makes it right'. Enterprise support is about more than how fast they can ship new gear off a break.
|
# ? Jun 19, 2014 01:59 |
|
|
# ? Apr 26, 2024 15:24 |
|
KennyG posted:This a thousand times. Without going into specifics, we got a massive upgrade due to sales mouths writing a check the engineering team couldn't cash. It always sucks moving through the pain but when the sales engineering team says you pay for a solution and when their solution fails and you aren't writing a check to bridge the gap, they aren't lying. Yeah, NetApp did something similar for us at the last gig due to the big CDOT issues we were having last year. They stuck with us via ungodly huge email chains and conference calls until the issues were fixed, and although I don't think anyone was happy with the situation, I think they did get some credit for the response.
|
# ? Jun 19, 2014 02:16 |