Using vSphere PowerCLI Get-Annotation: Low Performance and possible Mitigations

While working on a script used for compliance checking on vSphere virtual machines I noticed that getting annotations (custom attributes) from vSphere has quite a few peculiarities. One major disappointment was the very slow speed of the PowerCLI Cmdlet Get-Annotation. In the following, I will provide details and show possible mitigations.

Just to be clear: the ultra-slow execution of Get-Annotation might not be a problem at all in your environment. If you just run a script once in a while to get annotations for a handful of virtual machines there is no need to further look into this subject. But the consequences of its low performance might become more of a nuisance if the number of VMs and assigned custom attributes are higher and you have to run scripts retrieving and evaluating them quite often. For example: In our vSphere V7 environment we are running 200+ VMs and use a handful of custom attributes to manage them. Our compliance checking script initially ran for more than six minutes to get and process information from vSphere, Active Directory, DNS, and DHCP. When I was tasked with extending this script, its very long running time was seriously slowing down my development cycle of coding and testing. The rather astonishing result of adding some profiling code was that it spent most of its time calling Get-Annotation and waiting for results. Therefore I began an investigation into the details.

Getting information about all VMs and all annotations for them

For starters, let's get some basic information on all virtual machines and measure how long this will take:

Measure-Command {$vms = get-vm}
<# sample output:
...
TotalMilliseconds : 56.4338
#>
$vms.Count
<# sample output:
231
#>

Now let's get all annotations for all virtual machines gathered with the previous run of Get-VM:

Measure-Command {$ann = ($vms | get-annotation)}
<# sample output:
...
TotalMilliseconds : 5184.1585
#>

Executing this is about twenty times slower than getting quite a bit of other information about all VMs. While getting annotations this way is a lot slower it still doesn't sound too bad to worry about it. But then this kind of 'bulk get' is the fastest way I found to get annotations. Your own script code might do much worse - as mine initially did.

'Filter left' is a bad idea here

One common optimization strategy for Powershell pipelines is 'filter left': If a Cmdlet supports some kind of filtering it is usually faster to use this instead of applying where-object to its results further down in your pipeline. While Get-Annotation has no -Filter parameter it does have -CustomAttribute which allows you to specify the custom attributes you are interested in. Let's see how this performs by specifying just two attributes instead of all to be fetched for our VMs (for execution please substitute attributes you are using in your environment):

Measure-Command {$ann = ($vms | get-annotation -CustomAttribute ("ManagedBy", "DeletionDue"))}
<# sample output:
...
TotalMilliseconds : 24509.2726
#>

That's quite impressive in a negative way: Asking vSphere to get you only two attributes instead of all takes almost five times longer. Let's see how this scales by asking Get-Annotation for more custom attributes, first three of them:

Measure-Command {$ann = ($vms | get-annotation -CustomAttribute ("ManagedBy", "DeletionDue", "Backup Status"))}
<# sample output:
...
TotalMilliseconds : 32414.4898
#>

Now repeat with asking for four attributes:

Measure-Command {$ann = ($vms | get-annotation -CustomAttribute ("ManagedBy", "DeletionDue", "Backup Status", "Archived"))}
<# sample output:
...
TotalMilliseconds : 39692.3221
#>

Execution time seems to increase in an almost linear fashion with the number of attributes specified with -CustomAttribute. While this can make you wonder how inefficiently custom attributes are implemented in vSphere the only pragmatic solution (for now) is to stay away from explicitly asking for specific attributes when using Get-Annotation and do some handling/filtering of them on your own.

How long does Where-Object take in this case?

For performance reasons, we are now back to fetching all annotations for all virtual machines. But if we only need a few of them we might want to apply Where-Object after fetching them. Shown here is the code and results for filtering two to four attributes:

Measure-Command {$ann = ($vms | get-annotation | where-object {("ManagedBy", "DeletionDue") -contains $_.Name})}
<# sample output:
...
TotalMilliseconds : 5066.0718
#>

Measure-Command {$ann = ($vms | get-annotation | where-object {("ManagedBy", "DeletionDue", "Backup Status") -contains $_.Name})}
<# sample output:
...
TotalMilliseconds : 5137.3431
#>

Measure-Command {$ann = ($vms | get-annotation | where-object {("ManagedBy", "DeletionDue", "Backup Status", "Archived") -contains $_.Name})}
<# sample output:
...
TotalMilliseconds : 5087.164
#>

That's an interesting observation: When using Where-Object the execution time does not differ noticeably if more attributes are to be included in the output. This is in stark contrast to the behavior shown for Get-Annotation in conjunction with -CustomAttribute.

Getting rid of empty strings

Another peculiar thing about vSphere annotations is that defining a custom attribute for virtual machines results in all of them getting a value for it, even if you don't explicitly set one. If you list the contents of the collection $ann you might see many lines where the value of a custom attribute simply is an empty string. Thus your collection of annotations can contain quite a lot of entries you might not need. To get rid of these entries add a filter expressing as follows:

Measure-Command { 
    $ann = ($vms | get-annotation | where-object {$_.Value -ne ""} )
}
<# sample output:
... 
TotalMilliseconds : 5141.478
#>

Getting all annotations and filtering the empty ones is only about a second slower than not filtering empty strings.

In the next step, I wanted to simplify the result collection by reducing the first member of every entry to the name of the VM. Right now this is an 'AnnotatedEntity' object with several members - which I don't have a use for. Here I ran into another unpleasant surprise: For unknown reasons the following statement with its additional Select-Object pipeline step now ran almost four times slower.

Measure-Command { 
    $ann = ($vms | get-annotation | where-object {$_.Value -ne ""} 
    | Select-Object @{Name="VMName";Expression={$_.AnnotatedEntity.Name}}, Name, Value)
}
<# output:
...
TotalMilliseconds : 19323.4264
#>

I didn't care into investigating this further and decided to omit this additional pipeline step for my purposes.

Getting the annotations for a specific virtual machine

So far this post has shown that iterating over a lot of virtual machines and retrieving annotations by name for each of them can be very time-consuming. Also, it became clear that 'bulk getting' and filtering afterward is much more efficient. Up until here, we have a collection of annotations for all VMs containing only these having a value other than an empty string. But you still might have to iterate over all VMs in your code and access the annotations for each VM inside this loop. How to do this efficiently? The collection $ann is a simple list where every entry is an object having three members for the 'annotated entity' (virtual machine), attribute name, and attribute value. The easiest way to get at the attributes for a specific VM would be to filter the collection with Where-Object for the name of the VM in question. Here this is done in a loop over all existing VMs (circa 230):

Measure-Command { 
    foreach ($vm in $vms) {
        $vm_annotations = ($ann | where-object {$_.AnnotatedEntity.Name -eq $vm.Name}) 
    }
}
<# output:
...
TotalMilliseconds : 964.0608
#>

This level of performance can be OK for a small number of iterations and a few custom attributes to be handled. But to satisfy my curiosity I tried if this could be done faster. I tried to achieve this by copying the original 'flat' collection into a nested hash table where the keys for the outer table are the names of the virtual machines and each entry in this outer table is itself another hash table for attribute names and values for a specific VM. This might sound complicated but can be done with a few lines of code. Also, this preparatory step doesn't need much time:

Measure-Command {
    $processedAnnotations = @{}
    foreach ($a in $ann) {
        $vmName = $a.AnnotatedEntity.Name
        if (-not $processedAnnotations.ContainsKey($vmName)) {
            $processedAnnotations[$vmName] = @{}
        }
        $processedAnnotations[$vmName].Add($a.Name, $a.Value)
    }
}
<# output:
...
TotalMilliseconds : 15.7992
#>

Now does using a nested hash table lead to better performance?

Measure-Command { 
    foreach ($vm in $vms) {
        $vm_annotations = $processedAnnotations[$vm.Name]
    }
}
<# output:
...
TotalMilliseconds : 8.6309
#>

Quite impressive: This is about a hundred times faster than the Where-Object way of getting the annotations for a single VM.

Conclusion

This is a blog entry that I hope will become obsolete in the future. I guess that it would be not too difficult for VMWare to fix the bad performance for retrieval of custom attributes experience when using Get-Attribute. But then I don't think that fixing performance issues for rather specialized cases like this one would rank high on any development to-do list. So maybe some hints about how to optimize the performance of handling vSphere annotations might be helpful for some people for still a while.