Machine management for Power Automate Desktop

How to manage RPA machines at scale in Power Automate — machine groups, sizing, health monitoring, and the operational discipline that production bot fleets need.

Updated 2026-10-04

A production RPA deployment with dozens of bots running across multiple machines is real infrastructure — Windows VMs, network, agent software, credentials, monitoring. Machine management in Power Automate handles the fleet lifecycle: registration, grouping, scaling, health, retirement. Getting it right makes RPA operations sustainable.

The machine model.

Machine — a Windows host registered with Power Automate.
PAD agent — software running on the machine; connects to cloud.
Bot account — Windows user account that bots run as.
Machine group — pool of machines for load balancing.

Each machine connects to the Power Automate service via the agent; cloud service distributes work to available machines.

Registration.

Install PAD on the Windows machine.
Sign in with the Power Platform account.
Configure machine registration in Power Automate portal.
Machine appears in Machines list; ready to receive work.

Machine groups. Pool machines:

All machines in a group can run the same bots.
Cloud service assigns work to least-busy machine.
Group can be sized up by adding machines.
Failure of one machine doesn't stop the group.

For production reliability, single-machine setups are insufficient; groups provide redundancy.

Sizing considerations.

CPU — bots are CPU-light typically; 2-4 cores usually enough.
Memory — bots that drive browser-heavy apps need more (8GB+).
Storage — small; bots are stateless mostly.
Network — reliable; latency-sensitive.

Spec depends on the bot workload; benchmark before scaling.

Geographic distribution.

For multi-region operations, machines in each region.
Reduces latency to local applications.
Compliance with data residency.

Machine account vs interactive account.

Machine account — bot runs in a session without active user.
Interactive account — bot needs user logged in.

Interactive is easier to set up but ties up a logged-in session; machine account is more scalable.

Auto-update.

PAD agent auto-updates by default.
Updates can break bots if behaviour changes.
Mitigate: test new agent version in non-prod first.

For sensitive environments, disable auto-update and manually control.

Health monitoring.

Machine status — Online / Offline.
Run history — successes, failures.
Performance metrics — average bot duration.
Resource usage — CPU / memory utilisation.

For production fleets, dashboards visualise health; alerts on Offline > N minutes.

Common machine issues.

Agent disconnected — restart agent or machine.
High memory — bot leak; restart cleans up.
Windows update reboot — schedule reboots outside bot work hours.
Network drops — temporary; reconnect handled.

Production runs into these regularly; monitoring + automation reduce manual effort.

Capacity planning.

Peak vs average — bot work is often peaky (month-end, day-start).
Sizing for peak — adequate machines for peak load.
Off-peak idle — machines sit unused; consider hibernation if cost-sensitive.

Cost considerations.

VM cost — per-hour Windows VM cost.
Bot licence — per concurrent unattended bot.
OS licensing — Windows licence costs.

For 10+ bot machines, costs are significant; right-size carefully.

Bot deployment to machines.

Bot (desktop flow) configured to target a machine group.
Cloud service dispatches to available machine.
Machine downloads bot logic; executes.
Results streamed back to cloud.

The execution model is similar to a job queue.

Bot account passwords.

Stored in Azure Key Vault (modern pattern).
Or in Windows credential manager (legacy).
Rotated per policy.

Compromised bot account = potentially compromised systems bot accesses. Rotation matters.

Machine retirement.

Decommission unused machines.
Migrate bots to remaining machines.
Deregister; revoke credentials.
Audit confirms machine no longer accessible.

Disaster recovery.

Backups — bot logic in source control / solutions; machines themselves stateless.
Failover — second region machines if needed.
Recovery time — provision new machines from image; install agent; register.

Multi-tenancy concerns.

Bots from different customers (in ISV scenarios) — typically on dedicated machines.
Shared machines pose data risk.

Common pitfalls.

Single machine fleet. First failure stops all bots.
No monitoring. Bots fail; nobody notices for days.
Machine accounts shared with humans. Auditing impossible.
Updates uncontrolled. PAD update breaks bots; emergency rollback.
Capacity underestimated. Bot queue grows; SLA breaches.
Mixing dev / prod machines. Test bots affect production data.

Operational rhythm.

Hourly — health dashboard glance.
Daily — failure investigation.
Weekly — capacity vs trend.
Monthly — fleet audit; retirements / new provisions.
Quarterly — disaster recovery drill.

Strategic positioning. RPA machine management is infrastructure work. For organisations with significant bot deployments, dedicated FTEs manage the fleet. For smaller scale, IT can absorb the responsibility. Either way, treating bots as software services rather than spreadsheets is essential — monitoring, alerting, capacity planning, lifecycle management. The infrastructure investment pays back in reliability; the alternative is a tangle of breaking bots and frustrated users.

Related guides

← All guides Glossary →