Monday, 6 April 2015

Exchange DAG Node down, Windows Cluster Service will not start

Problem: After a server reboot an Exchange DAG member is down, the Cluster Service Fails to start.

Unable to start the cluster service it terminates with the Event ID 7024

"The Cluster Service terminated with service-specific error cannot create a file when that file already exists." 


Error Messages in Event Log:



Service Control Manager Event ID 7001

The Cluster Service service depends on the Csv File System Driver service which failed to start because of the following error: 
The system cannot find the file specified.


Service Control Manager Event ID 7000

The Csv File System Driver service failed to start due to the following error: 
The system cannot find the file specified.



MSExchangeRepl Event ID 4113

Database redundancy health check failed.
Database copy: DB_NAME
Redundancy count: 1

Error: Passive copy 'DB_NAME\EXCH_SERVER' is not UP according to clustering.


MSExchangeRepl Event ID 2060

The Microsoft Exchange Replication service encountered a transient error while attempting to start a replication instance for DB_NAME\EXCH_SERVER. The copy will be set to failed. Error: The NetworkManager has not yet been initialized. Check the event logs to determine the cause.




Solution:

Check Device Manager and the Microsoft Failover Cluster Virtual Adapter has a yellow exclamation mark (this is netft.sys)

“This device is not working properly because Windows cannot load the drivers required for this device. (Code 31)”

Need to remove the "Microsoft Failover Cluster Virtual Adapter" and reinstall it using the following steps: 

1. From Device Manager/Network adapters, click on View-->Show hidden devices and then uninstall "Microsoft Failover Cluster Virtual Adapter" 
2. Reboot the server for changes to take effect 
3. After reboot Open Device Manager/Network adapters 
4. From Action Menu Select “Add Legacy Hardware” and then click Next 
5. Select “Install the Hardware that I manually selected from a list (Advanced)” and click Next 
6. Select “Network Adapters” and then click Next 
7. Select “Microsoft” From the left pane and select “Microsoft Failover Cluster Virtual Adapter” from the right list 

8. Once the Adapter is added successfully you be able to start the cluster service successfully and let the DAG sync up the queue log files.

NOTE: If you reboot your server and issue comes back follow the same steps but disable IPv6 on the network adapters since it is not required when Exchange is not on a Domain Controller.



4 comments:

  1. That worked perfectly! Thanks for sharing!

    ReplyDelete
  2. This worked and solved our DAG migration over to Scale Computing Hyperconvergence

    ReplyDelete
  3. Thanks dude, saved us today. Worked on a cluster we were using for SQL. Windows 2012, SQL 2014

    ReplyDelete
  4. You saved my life with this fix. I was on 4 hours of downtime. Nothing seemed to work and neither of the nodes will stay online. I noticed one node would stay until it timed out trying to establish quorum but the other node would stop almost immediately. No real events that meant anything and then I stumbled on your site.
    Your fix was quick and easy and something I would've never thought to check.

    Thank you, Thank you and Thank you for sharing this valuable fix.

    ReplyDelete