Symptom: Yesterday, one of the Lync Front-End servers failed. Almost all of the Lync services were not running, and the services would not start manually. Reviewing the Event Logs, which is something that none of us do until we are well into our troubleshooting, I found the following Event Log errors:
Event 32014, LS Application Server
The application threw an exception while starting.
The application urn:application:testbot threw the following exception when starting: Exception: System.Runtime.Serialization.SerializationException
> Message: The constructor to deserialize an object of type ‘Microsoft.Rtc.Internal.Sip.LocalCertificateNotFoundException’ was not found.
> TargetSite: Void CallStartAsync()
> StackTrace: at Microsoft.Rtc.ApplicationServerCore.ApplicationLoader.CallStartAsync()
> Source: Microsoft.Rtc.ApplicationServerCore
Cause: Startup errors.
Check the events prior to this to resolve the service startup issue.
Event 61002, LS MCU Infrastructure.
No certificate has been configured for secure transport.
The certificate assigned to process ReplicationApp(3756) was not found.
Certificate serial number: 46ae547f00000000fcda
Certificate issuer name: CN=IHelp CA, DC=infrastructurehelp, DC=com.
Cause: Incorrect configuration of the server or the certificate assigned to the server was deleted from the certificate store
Verify that a valid certificate has been configured.
Event 48005, LS Routing Data Sync Agent
The Routing Data Sync Agent has encountered an unexpected Exception: [Operation is not valid due to the current state of the object.], Trace: [ at Microsoft.Rtc.Server.McuInfrastructure.HttpTransport.LoadCertificate(CertificateInfo certificate)
at Microsoft.Rtc.Server.McuInfrastructure.HttpTransport..ctor(String listeningUrl, ICccpConfigurationProvider config, XmlWriterSettings writerSettings)
at Microsoft.Rtc.Server.Replication.Http.ReplicationHttpAdapter..ctor(String listenerUri, ICccpConfigurationProvider config)
at Microsoft.Rtc.Server.Replication.Http.ReplicationHttpAdapter..ctor(String listenerUri, ServiceConsumer serviceConsumer, StoreAccessor regStoreAccessor, StoreAccessor uscStoreAccessor)
at Microsoft.Rtc.Server.Replication.ReplicationApp.Initialize(AutoResetEvent workerStartedEvent, ManualResetEvent serverProcessDiedEvent, ManualResetEvent shutdownEvent, ManualResetEvent updateMasterStateEvent)
at Microsoft.Rtc.Server.Replication.ReplicationApp.Main(String args)]
OK, the errors make it sound like a certificate error. Actually, it was pretty clear that it was a certificate error.. So, I opened up the Certificates MMC and verified that the cert was still there. It wasn’t accidentally deleted or anything like that. In fact, the cert still has almost a year before it expires. I started the Deployment Wizard and found the following:
The Certificate Wizard shows the certificate, shows that it is not expired ( today is September 30th, 2014), and that it is “partially” assigned in that the Web services internal shows assigned while the other services show the certificate is missing.
Resolution: I found that I could either replace the existing certificate with a new one, or I could just use the Assign option and re-assign the same certificate. In both cases, the Status became Assigned for all of the services, and the Lync services all started back up properly.
Cause: I am not sure. I know that some patching has been done recently, but I have no idea what patch might have caused this issue. BTW, I also found this issue existing on almost all of the Front-End servers, but only the one server had the services stopped. I am betting that if any of the other Front-End servers were restarted, they would have failed in exactly the same way.