Friday, February 28, 2014

SCVMM Service deployment Error (22758) The system cannot find the file specified – but it is right there

I spend days on this issue and did not want to lose the ‘why’ this cost me days.

First, the details of the error:

Error (22753)

The script command with properties: Type (PreInstall), Deployment Order (20) and Parent Type (ApplicationProfile), failed to complete successfully. Refer to the errors list for more information.

Error (22758)

The guest agent encountered an exception while running a script command. Win32ErrorCode: 2147942402 DetailedErrorMessage: The system cannot find the file specified

I can definitively tell you that this error has nothing and something to do with the script being run.  And, this is the SCVMM Agent itself tossing this error, not my script.

First; the most useful log in SCVMM Service Templates is the SCVMM Guest Agent diagnostic log.  You find it buried in the OS of the VM under %ProgramFiles%\<System Center Version>\<VMM Guest Agent Version>\bin\diagnostics\VmmGuestLog.svclog

If you try to open this up with Notepad or WordPad or NotePad++ it just looks like a bunch of poorly formatted XML.  Very difficult to troll through.

However, if you use Microsoft Service Trace Viewer, you have a totally different experience.  Trace Viewer is part of Visual Studio, which you can get an Express version for free, or download it here.

Looking through this I go looking for the events that the SCVMM Job log tossed at me.

I find:

4: GetExpandedGCEErrorInfo reported The script command with properties: Type (PreInstall), Deployment Order (20) and Parent Type (ApplicationProfile), failed to complete successfully. Refer to the errors list for more information.

But not the Win32ErrorCode.

Right after the above is the following:

4: Encountered an exception for ActionItem Type: Gce, Exception: Microsoft.VirtualManager.GuestAgent.Common.GuestAgentException: Error in the application.
   at Microsoft.VirtualManager.GuestAgent.Gce.GceProcessor.GetStdErrorMatch()
   at Microsoft.VirtualManager.GuestAgent.Gce.GceProcessor.PerformPostProcessing(List`1& warnings)
   at Microsoft.VirtualManager.GuestAgent.Gce.GceProcessor.Run(IQueryCancel queryCancel, List`1& warnings)
   at Microsoft.VirtualManager.GuestAgent.AIFactory.GceActionProcessor.ExecuteActionItemSync(Boolean restarting, ActionItem actionItem, IQueryCancel queryCancel, List`1& warnings)
   at Microsoft.VirtualManager.GuestAgent.QueueManager.QueueManager.QueueProcessorThread(Object context)

This definitively tells me that it is not my script, but instead the SCVMM Agent is the one not finding the file. The best that I can determine is that that GetStdErrorMatch() being empty is the root.  But what does a positive return look like?

And, to make it worse, the script ran.  Everything worked!  And if I checked the VM all of the SCVMM artifacts from my Custom Resource were there ( %ProgramData%VirtualMachineManagerData\<one of the temp paths>).  Nothing was missing!

I spent the next few days changing the script, settings for the execution, accounts, paths, different ways to execute it.  It would work sometimes but mostly not.  This was the most frustrating part.  Every now and then I would not get the error; that small glimmer of hope and gets you frustrated and ends up driving you insane.

On the drive home last night the pieces slid together.  This was my log cleanup script.  It was deleting all of the log files that were empty (0kb).  I produce a lot of logs,  this is remote and headless execution, it could be happening anywhere but I have no console to observe anything.  You have to capture every little thing that happens and you don’t want to sort through 26 files when only the 5 with any details actually mean anything.

Not being the one that wrote the code what I am guessing is that; It appears that the output file is created, then fetched, then written back to. And my script deleted it in the middle somewhere.  Simply timing alone, milliseconds, would let it pass.  A moment of a slow disk drive on a hypervisor, possibly.

So, I removed the capturing of the Standard Out and Standard Error and all is good.  Seems like a silly little thing, right?  But again, this is remote installation, if you don’t log it you have no idea what happened.  So turning off logging was actually the very last thing I had any desire to do.

No comments: