With growing virtualization one major challenge which many customer experience is how to segregate if a performance issues is caused due to infrastructure un-availability or resource contention. Such situation comes when application team complains about poor performance of their server (VMs) and we start our investigations. In such situation the first landing platform for any administrator will be vROps which can give a holistic image of virtual environment as well as for that piece of infrastructure which is not performing as expected. But the real challenge is which parameters to look at and is this information available in the form of single dashboard? However vROps collects all the metrics available for any environment but not all of them are readily available on a single console.
So how good that be if all the parameters required for troubleshooting VM performance related issues can be put together into a single dashboard so that in case of an event the information is not only available for that VM but also for entire virtual infrastructure.
Recently for similar problem I was approached by one of my customer asking if all of the virtual machine related information can be seen on single vROps dashboard. However some of this information is available in existing dashboard which comes pre-configured with vROps but it is scattered at multiple locations whereas customer wanted to see it on a single dashboard and as per their choice of format. I tried to consolidate most of the parameters which are required to be looked at while troubleshooting such issues and consolidated them into a single dashboard and here is how it looks like.
Note: VM names are hidden purposefully for demonstration purpose Refresh interval for all dashboard is 60 sec
Top-5 CPU & Memory Metrics: Above dashboard shows list of top-5 virtual machines with high CPU Demand (%), CPU/Memory Contention (%). This gives you an initial hint as which virtual machines are experiencing Contention for their CPU & Memory metrics. This information plays very important role during troubleshooting performance related issues
VM CPU Demand – Contention – Ready (%): This gives a list of all the virtual machines with percentage value of their CPU Demand, Contention & Ready. Here we are capturing all major CPU bottlenecks into a single widget
VM Memory Demand – Contention – Ballooning (%): Similar kind of widget which captures percentage value virtual machine’s memory related metrics (Demand, Contention & Ballooning). Could be a case where a virtual machine’s CPU metrics are showing good but contention value is seen on Memory parameter, it can impact pefroemance of that virtual machine very badly. For more information on Demand VS Contention there is a very good explanation described by Iwan Rahabok in his blog.
VM CPU Demand Heatmap: This heatmap shows CPU Contention (%) and CPU Demand (%) for all virtual machines. Heatmap color is dependent on CPU Demand (%) metrics whose value range starts from 85 (green) to 100 (red)
VM Memory Consume Heatmap: This heatmap shows Memory Consumed (GB) and Memory Workload (%) for all virtual machines. Heatmap color is dependent on Memory Workload (%) whose value range starts from 80 (green) to 100 (red)
VM Network IO Stats: Network play another major role for performence so we also need to look at if there are packet drops for affected virtual machine. So this widget shows information for each vm Packet RX, Packet Tx and Packet Drop (%). If Packet Drop (%) is greater than 20% then there could be some serious issues at network front which needs to be addressed
VM Disk IO Stats: Similarly Disk IO values for each vm showing Command per second, Read Latency (ms) & Write Latency (ms). Ideally disk latency should be between 5 -15 ms with rare peak spikes of 100 – 200 ms for optimum performance
VM Network Packet Drop Heatmap: As the name suggests this heatmap captures all virtual machine network packet drops and its value range is from 20 (green) to 100 (red). Which means for any virtual machine if packet drops are more than 20% then it will start moving the heatmap from green to red
Datastore Latency (ms): This heatmap shows all the datastore associated with vCenter server and its health status. Ideally the latency for datastore should fall between 4 ms to 40 ms, but again it may vary based upon the environment. So this value can be edited as per the desired value. To edit this widget click on Pencil icon (top right of widget) and set the appropriate value
I have tested this dashboard in my lab environment and the results looks impressive, if anyone wants to try this dashboard I have exported its .json and uploaded to this link.
For any help about Importing dashboard in your vROps environment please visit my earlier blog which has step-by-step information on this.
With this I end this blog… I hope it helps… I’m currently working on more such dashboard and will be sharing about it in my upcoming blogs… till then Happy Reading 🙂