Recently there was a request from one of customer to have some mechanism to capture idle VM information from their infrastructure. However this information is part of Efficiency tab of vROps but there are some points which needs to be considered before coming to conclusion. Ideally Efficiency tab calculates average utilization of resources based upon the parameters defined in vROps policy and if customers are using Default vROps policy then it calculates utilization for 24×7. Since the utilization (VMs) may not be same throughout the day or across the week hence while calculating the Efficiency sometimes the information available isn’t satisfactory. In order to come up with this customers need to edit Time Range in vROps policy to desired duration.
What could be the other possible way to get Idle VM information without making changes to the Default policy (because many customer don’t prefer making changes to default policy)
There was similar request where customer approached me where they were looking for some OOTB vROps dashboard which can solve this purpose. Now the real question is how to estimate which all VMs in your environment are Idle VM because not necessarily if the VM is Powered Off then only it can be counted as Idle VM. Sometimes customer creates VMs based upon the request but after some time application team stops using it and now that VM is simply sitting idle and consuming resources.
If Powered Off VM information can be captured from vCenter then why I need vROps..?
We always had Powered Off virtual machine information available with us via vCenter VI client so it makes no sense if we make that as our criteria. Also using vROps Analytics engine we can go one step deep to this approach and start segregating VMs based upon other parameters like network utilization, disk IO utilization etc.
Hence it makes lot of sense to first define criteria as when a virtual machine be considered idle. Based upon my research (reading community papers) I found that there are primarily 3 conditions which defines if a virtual machine can be considered idle or not
- Average CPU utilization < 100 MHz
- Average Disk IO usage < 20 KBps
- Average Network IO usage < 1 KBps
However these values may changes based upon the environment but just to create a baseline I’m going ahead with this approach. So I created this dashboard which captures information based upon the criteria we discussed above and here is how this it looks like.
First dashboard on Top-left gives the information on CPU, Network & Disk IO Idle Time (%) in descending order which places the most idle virtual machines on top of the list. It makes lot of sense while figuring out which virtual machine have highest idle values. As I mentioned earlier not necessarily it means that all VMs showing in this tab can necessarily be in power off state.
Second dashboard on Right side gives information of virtual machines with Highest Network Throughput (Tx/Rx). The idea behind putting this list was to know at any given point in time which virtual machines have highest network IO transactions going on. Also the same list has information VM to Host (Rx & Tx) values which can also be helpful while gathering information if VM is sending or receiving packets to/from its respective ESXi host or not
Third dashboard gives Average Network Usage (KBps) information and this is in the form of Distribution Pie-chart where it segregates virtual machines based upon their average network utilization.
Fourth Dashboard is a representation of metrics data collected using First dashboard in the form of Heatmap. Scale of this heatmap is from 70 – 100 which means if at any point Idle metrics (CPU, Network, Disc IO) crosses 70% of its value this heatmap will start placing it towards RED zone.
I have extracted the .json of this dashboard and uploaded it here so that if anyone wants to try it as well.
With this I end this blog…… I hope it helps.. .till then Happy Reading 🙂