Migrate a MS Failover Cluster (Server 2008 Enterprise 32Bit)

Error    01/01/1900 12:01:00 AM    FailoverClustering    1230    Resource Control Manager

Cluster resource 'Cluster Disk 1' (resource type '', DLL 'clusres.dll') either crashed or deadlocked. The Resource Hosting Subsystem (RHS) process will now attempt to terminate, and the resource will be marked to run in a separate monitor.

Thats the message I got after I had migrated one of my Microsoft Windows Failover File Clusters to new hardware.

Scenario

  1. Cluster: Two node, providing File services only.
  2. Quorum configuration: Node and Disk Majority
  3. Current Servers: Windows server 2008 Enterprise (32bit) (Lets call them Node1 and Node2).
  4. Replacement Servers: Windows server 2008 Enterprise (32bit) (Lets call them Node3 and Node4).

After adding Node3 and Node4 to the cluster, I failed over the services to Node3. Everything looked good, so I paused Node1 and Node2, not wanting them to be used, but also not wanting to evict them yet either.

I then rebooted Node3 and the services failed over to Node4 as I expected.

My plan was to have all four nodes participate in the cluster for a period of time in case there were any errors. This would allow me to move from one set of systems to another.

After a few days of operation, I came in one morning to an email indicating that the cluster services were non-operational. By the time I looked at things the cluster was already operational. Reviewing the logs, I could see that Node1 and Node2 had been rebooted.

Being the cautious tech that I am emoji_smile, I immediately rebooted Node1 and found that the same thing occurred.

I started researching the problem, in the event viewer, the only error I found was;

Error    01/01/1900 12:01:00 AM    FailoverClustering    1230    Resource Control Manager

Cluster resource 'Cluster Disk 1' (resource type '', DLL 'clusres.dll') either crashed or deadlocked. The Resource Hosting Subsystem (RHS) process will now attempt to terminate, and the resource will be marked to run in a separate monitor.

I was unable to locate any information on this issue, so I was forced to contact Microsoft. Thankfully, we did have support.

After informing them of the problem and the error message, they reviewed my configuration, tried many tweaks to no avail and then they went on their merry way to research the issue on their end.

The next day, they contacted me with the cause. Apparently there is a bug in the 2008 operating systems where the ownership of the core cluster is not available to be transferred to nodes that are added later. This occurs only when you have more than two nodes in the cluster.

My solution;

After adding Node3 and Node4 to the cluster, I failed over the services and used the command line to move the core resources cluster group "cluster group" /move:Node3. Once this was complete, I evicted the original nodes, Node1 and Node2.

After doing the above, the cluster began functioning properly including on the reboot.

Leave a Reply