Register a SA Forums Account here!
JOINING THE SA FORUMS WILL REMOVE THIS BIG AD, THE ANNOYING UNDERLINED ADS, AND STUPID INTERSTITIAL ADS!!!

You can: log in, read the tech support FAQ, or request your lost password. This dumb message (and those ads) will appear on every screen until you register! Get rid of this crap by registering your own SA Forums Account and joining roughly 150,000 Goons, for the one-time price of $9.95! We charge money because it costs us money per month for bills, and since we don't believe in showing ads to our users, we try to make the money back through forum registrations.
 
  • Post
  • Reply
para
Nov 30, 2006
I have a somewhat complex Exchange 2010 configuration and have run into a problem that is probably unusual to most companies.

I have two sites with 3 exchange servers. Site A is the main site and has 2 servers. Site B is the DR site and has one server.

Each server runs all roles, however all clients connect to the CAS in Site A through a load balancer that is configured only for the servers at Site A. Site B just kind of sits there in terms of CAS and edge transport until it is ever needed in a disaster recovery situation.

Our WAN between the two sites only has a 3Mb connection. However each site has 10Mb cable internet connections.

Because replication requires 8Mb, we created a VPN tunnel over the internet and policy routed everything between the 3 servers to go over the internet instead of the WAN.

What we have found is that even though the tunnel allows us to do hot replication without affecting our production WAN environment, it also causes high latency in the clustering service. Because of this we see the node at Site B drop in and out of the cluster rather frequently.

This becomes a problem whenever we need to do maintenance on one of the servers at Site A. If we take a server down at Site A, and the server at Site B drops out of the DAG because of latency problems (or because either of the cable connections goes down), we lose quorum.

What I have done to mitigate this problem is to let all inter-node exchange traffic go over the WAN as normal, then policy route the log shipping and seeding port (64327 per http://technet.microsoft.com/en-us/library/bb331973.aspx) over the VPN.

This appeared to work, however there is another problem.

Question
When I did this I noticed that there was still a high amount of traffic between Site A and Site B over the WAN, to processes belonging to store.exe.

Why is store.exe transferring so much data to other nodes in the DAG? Isn't all replication and seeding supposed to be handled by msexchangerepl.exe?

Adbot
ADBOT LOVES YOU

para
Nov 30, 2006

Hawkline posted:

Have you made any headway on this problem? I found it interesting, but don't have any knowledge likely to explain why store.exe is doing a thing. Are you in Datacenter Activation Coordination Mode?
Sort of..

I port mirrored the ports on the switch and duplicated everything to a PC to get a packet capture. I then filtered out everything except for the data between the two local exchange servers and the one server off-site. Did an analysis in Wireshark and found the top 10 conversations and matches the ports with the processes running on each server.

What I found was the store.exe was the process for the local server with all of the databases active. The conversations matched up to processes on the DR server (with un-mounted databases we are replicating to):
  1. Unknown process on remote port 445
  2. Microsoft.Exchange.Search.ExSearch.exe
  3. MsFTEFD.exe
The last two are related to Exchange Search content indexing. The first, after doing some research, also appears to be related to content indexing but I can not be certain.

The last two also always seem to be using either source or destination TCP port 49903. However I am not sure if this can be set and can not find anything on Google as to if this is randomly chosen or not. I need to find a way to either stop this traffic or isolate it so that it can be routed over the VPN tunnel.

However, according to Microsoft (http://technet.microsoft.com/en-us/library/bb232132.aspx) content indexing is only supposed to be replicated during database seeding. Once seeding has completed, indexing is supposed to occur on the local database. When the packet capture was taken there was no database seeding going on.

So, I really haven't gotten anywhere.

I'm not sure what Datacenter Activation Coordination Mode is.

para fucked around with this message at 23:32 on Jul 16, 2012

  • 1
  • 2
  • 3
  • 4
  • 5
  • Post
  • Reply