Hi all,

For my first blog post over here at Model I figured I’d pass on an interesting issue I ran across a few weeks ago.  I was investigating what was presented as an unhealthy 4 node Hyper-V cluster.  The symptoms were that 2 of the nodes would randomly go into a paused state.  If any workloads were running on those nodes during this event, they would be gracefully drained off to the other two healthy nodes.

Inside of Failover Cluster Manger you could see that the nodes indeed were paused




For testing I began by resuming operations on the nodes without failing back any roles.  I was hoping to catch the pause in action and then be able to crawl the cluster logs for the culprit.  So I resumed the forth node in the cluster and began taking a peak at the cluster logs




I found no errors in the last few days and continued to dig around when I suddenly noticed about 10 minutes after the resume that the cluster was back in a paused state.  I assumed on the refresh of the cluster logs I’d see a nice trail of events and have a great start on tracking down the root cause.


I had high hopes for the cluster logs but after looking at them they showed clean along with the Windows System, Application, Hyper-V, Failover Cluster, and Virtual Machine Manager log.




After a few hours of searching I was still no closer to an answer, but since there was a lack of errors I had the idea that maybe this wasn’t an error but expected behavior.  So what could be causing a cluster node to pause?  I then started thinking about Virtual Machine Manager and the Performance Resource Optimization functionality.


PRO was enabled along with Dynamic Optimization, so I went ahead and disabled these options for the cluster and then resumed one of the paused nodes to test, pretty confident I’d found the issue.


10 minutes later it was back to paused!  This time I did notice something interesting on the last job status in VMM.




So with that being logged I was then able to crawl back through the VMM logs and confirm that on both nodes this was not an error of any type but VMM placing the nodes into a paused state.




After I disabled all PRO settings in VMM across all clusters, Dynamic Optimization on all clusters, and finally removed and then added the cluster back into VMM, I was able to get the nodes to stop pausing at random intervals.  So the moral of this story is be in control of your automation and not let your automation control you.  Proper health checks to determine if that automation is correctly working could be the difference of it being a time saver or turning into a time vampire like in this case.

About the Author: Nick Taylor

Consultant – Model Technology Solutions Nick is an IT professional with more than 19 years of experience and a passion for learning about technology. His areas of expertise include datacenter, hypervisors, storage, network, cloud, and OSD. He spends his free time delving in crypto, video games, automation, IoT, and really anything nerdy.

Three Minutes For A More Secure & Efficient Infrastructure

Short and to the point, Steve’s Email Blasts give you endpoint management tips, tricks, and news in three minutes or less email read-time, guaranteed.

Model says no to spam. Privacy Policy

Model Technology Solutions

Model Technology Solutions is a small but mighty band of infrastructure experts. We’ve helped companies in diverse industries to modernize and automate their infrastructures through effectively managing their Microsoft endpoint suite.

With us on your team, you’ll watch your security and compliance go up and your IT team’s costs (and headaches) go down. You’ll relax in knowing that your endpoints will be secure and online when your users need them most. And you’ll finally get back to your most-important tasks.

Model Technology Solutions
12125 Woodcrest Executive Drive, Ste. 204 Creve Coeur, MO 63141

Phone: (314) 254-4138
General Inquiries: model@model-technology.com
Sales and Quotes: sales@model-technology.com