I have done migrations several times in the past, and the final step of migrating the edge environment has never seemed too challenging. Microsoft has a nice TechNet article on this subject. It has a nice step-by-step process. There are also lots of great articles out there written by my colleagues. It is a well-known process. Well, you would think it is, at least.
In this instance, I failed twice. I figured I was missing something that was different in this case.
Situation: Federation is working using the OCS Edge environment with the Lync 2010 servers. The new Lync Edge servers were installed in the same network segment as the OCS Edge servers, and the firewall rules were all in place, tested, re-verified, and tested again. The certificates were verified multiple times.
Next try: So, not trusting myself, and it being clear that I didn’t know what I didn’t know, I engaged a well-known Microsoft PFE with years of experience. We worked through the process together. We failed. We both researched over and over looking for something that we missed.
This Try: Yep, you guessed it, we failed again. I hated to have to do it, but I called PSS. I hate doing it for a few reasons, but I have to admit it was the best solution in this case. J
Anyway, in our troubleshooting, we were getting really frustrated because each time we ran the Test-CsFederatedPartner cmdlet, we would get an almost immediate response as shown here:
PS C:\ > Test-CsFederatedPartner -targetfqdn edge01.companyname.com -domain othercompany.com
Test-CsFederatedPartner : A 504 (Server time-out) response was received from the network and the operation failed. See the exception details for more information.
At line:1 char:24
+ Test-CsFederatedPartner <<<< -targetfqdn edge01.companyname.com -domain othercompany.com
+ CategoryInfo : OperationStopped: (:) [Test-CsFederatedPartner],
+ FullyQualifiedErrorId : WorkflowNotCompleted,Microsoft.Rtc.Management.Sy
This was a pretty worthless response. So, we used the Lync Logging Tool and tried to capture what was going on at the Lync Edge servers. We were not catching anything of value at all there either. In fact, we weren’t getting any errors at all. It was killing us.
I fired off snooper on my client machine, and I did finally find something worth reviewing as shown here:
07/11/2014|17:00:40.363 1AA0:1094 INFO :: Data Received – 10.100.127.194:5061 (To Local Address: 10.119.20.196:49723) 735 bytes:
07/11/2014|17:00:40.363 1AA0:1094 INFO :: SIP/2.0 504 Server time-out
Authentication-Info: TLS-DSK qop=”auth”, opaque=”2F0B06CC”, srand=”22152E76″, snum=”2429″, rspauth=”216b4f7de55af7427567d4600c6c1cc0ed0424bf”, targetname=”frontend01.companyname..com”, realm=”SIP Communications Service”, version=4
Via: SIP/2.0/TLS 10.119.20.196:49723;ms-received-port=49723;ms-received-cid=2B900
From: “Kaufmann, Russ”<sip:email@example.com>;tag=ad9fb18754;epid=9eb3bb686d
CSeq: 1 SUBSCRIBE
ms-diagnostics: 1065;reason=”Federation is disabled”;domain=”othercompany.com”;source=”sip.companyname.com”
07/11/2014|17:00:40.363 1AA0:1094 INFO :: End of Data Received – 10.100.127.194:5061 (To Local Address: 10.119.20.196:49723) 735 bytes
Ah ha! Federation is disabled. A clue. Wait, though. Everything worked before, and all we did was reconfigure the route. In fact, we verified that the Media was traveling across the new Lync Edge servers before we even started. All we needed to do was move the Federation route. How could it be disabled? Well, a quick search of 1065;reason=”Federation is disabled”;domain=”othercompany took us to Pat Richard’s blog. It pointed out a possible issue with a security policy. By the time we read this post, it was clear that it wasn’t the issue.
Enter PSS: First, we were shamed by the PSS rep. He knew of both of us and was shocked that we couldn’t handle something so easy. Three minutes later, we had the answer. Yes, I felt stupid, but it was at least a quick call and it was kind of fun to be told that I should be ashamed of myself by a PSS rep. J
Yes, I forgot to set the policy.