March 10th, 2023

Long running RunCommands

Joseph Calev
Principal Software Engineer, Azure Core Compute

Recently, there’s been a bit of confusion involving long running RunCommands.

For reference, the default wait time for RunCommand has been 90 minutes – the same as all other extensions. But what if you have a script that takes longer?

Well, we’ve added a timeoutInSeconds parameter just for that purpose!

az vm run-command create --resource-group "myResourceGroup" --location "West US" --parameters arg1=param1 arg2=value1 --run-as-password "<runAsPassword>" --run-as-user "user1" --script "Write-Host Hello World!" --timeout-in-seconds 7200 --run-command-name "myRunCommand" --vm-name "myVM"

The only slight problem is this won’t work. If your script runs for more than 90 minutes, you’ll see an error like this.

Provisioning of VM extension MY_RUNCOMMAND_NAME has timed out. Extension provisioning has taken too long to complete. The extension did not report a message. More information on troubleshooting is available at https://aka.ms/vmextensionwindowstroubleshoot

Huh? As long as it’s within the boundary of your timeout, shouldn’t it work? The problem lies with how RunCommand executes. There are two modes – synchronous and asynchronous.

In synchronous mode, the extension writes a status of ‘Transitioning’ and executes your script. If the script does not finish within the standard time (90 minutes) then it’s marked as timed out with the error above.

In asynchronous mode, the extension writes a status of ‘Succeeded’ and executes your script. The 90 minute timeout no longer applies because as far as the service believes, your script executed. Now it’s free to run for as long as it likes.

To use asynchronous mode, simply set the parameter.

az vm run-command create --resource-group "myResourceGroup" --location "West US" --async-execution true --parameters arg1=param1 arg2=value1 --run-as-password "<runAsPassword>" --run-as-user "user1" --script "Write-Host Hello World!" --timeout-in-seconds 7200 --run-command-name "myRunCommand" --vm-name "myVM"

Now your script will be free to run for as long as necessary.

But how will you know if it timed out or succeeded? Well, in terms of timeouts – we depend on Windows to do the actual timeout. When we execute your script, we just supply the same timeout you gave us to the process. If the script timed out, the extensions wait handle will timeout and we’ll mark the script as timed out, then give the exit code of -5, which is meant to mirror the Windows system error codes.

So, how can you know what happened? To do that, you’ll need to poll the instanceView of the virtual machine.

az vm run-command show --name "myRunCommand" --vm-name "myVM" --resource-group "myRG" --expand instanceView

This may not sound optimal. Is there a better way? Perhaps. As I previously stated, very long running operations should be using VM Applications instead of RunCommand. These won’t need to worry about a timeout by default, although you’ll still need to poll the machine’s instanceView to learn the result (although you can pass in the treatFailureAsDeploymentFailure option to fail the deployment for shorter installs).

 

Author

Joseph Calev
Principal Software Engineer, Azure Core Compute

Software Engineer Lead at Microsoft. Focusing on enabling customer scenarios for VM extensions and applications.

0 comments

Discussion are closed.