Tools Don’t Test. Testers Do.
A Tester-First POC That Turned a Painful Workflow into a System-Level Test
This isn’t about Python, Playwright, or becoming a developer.
It’s about exploring real risks testers face every day — and refusing to ignore them just because they’re inconvenient to automate.
“I’m a tester. I don’t write production code for a living.
But I built a small automation flow that exercised real user behavior end-to-end — because the risk was real, repetitive, and nobody else was covering it.”
This story isn’t about learning a new tool stack or proving technical range.
It’s about what happens when a tester sees recurring risk and decides to explore it properly instead of accepting friction as “just how things are.”
Where the Friction Lived
This didn’t start as a roadmap or an assigned initiative.
It started with repetition.
We had a set of scenarios we had to test again and again:
Provision a Windows machine
Install our agent
Verify services on the machine
Confirm the agent showed up correctly on the dashboard
Repeat this across builds, versions, and environments
These steps weren’t edge cases.
They were preconditions for most of our UI end-to-end testing.
And yet, they were handled manually or semi-manually every time.
The workflow itself was the problem.
Not UI testing.
Not API testing.
A system-level user flow that moved back and forth between machines, services, APIs, and UI.
In reality, the testing looked like this:
Slow to execute.
Easy to get wrong.
Hard to repeat consistently.
That’s when the real testing question surfaced:
What risks does this workflow expose that UI tests alone will never catch?
This Was a POC. And a Risk.
This was not a “let’s build a framework” project.
It was a Technical Opportunity Checkpoint.
I didn’t know if:
The flow could be automated cleanly
The tooling would hold together
I could stitch system behavior and UI validation without faking things
I’m not an SDET.
I don’t spend my days designing abstractions.
But I do understand how users install and experience this product.
So I approached it the only way I know how: as a tester.
Break the problem down.
Explore one boundary at a time.
Keep asking: what would make this look healthy when it actually isn’t?
From Script to System-Level Test
I started small. No grand design.
One capability at a time:
Provision a Windows VM
Automated setup using Python to remove human inconsistency.Remote access
Used SSH (viaparamiko) to interact with the machine exactly as an admin or user would.Installer transfer
Copied the real agent binary. No mocks. No shortcuts.Real installation
Executed PowerShell commands following the same steps users follow.Service validation
Parsedscand PowerShell output to confirm:service existence
running state
restart behavior after failure
This mattered because “installer finished” does not mean “agent is healthy.”
Dashboard verification
Used Playwright (Python bindings) to verify:agent registration
status propagation
API responses aligned with what the UI claimed
At this point, something important changed.
This was no longer “a script.”
It became a repeatable, system-level test flow that exercised the product the way users actually experience it.
Why Python, Not TypeScript?
We already use Playwright with TypeScript for browser E2E tests.
So yes, this was a deliberate deviation.
The reason was simple:
This problem wasn’t browser-first.
Python gave me:
Straightforward SSH and file transfer
Cleaner service inspection and error handling
One place to orchestrate machine setup, system validation, and UI checks
Could this have been done in TypeScript? Probably.
Would it have required more glue, dependencies, and orchestration? Definitely.
For a POC, clarity beats purity.
Where This Became Testing (Not Automation)
This is the line most automation misses.
I didn’t:
Stub services
Skip installation steps
Treat “service started” as proof of correctness
Instead, I looked for failure signals:
Installer succeeds but the service never stabilizes
Service runs but stops reporting after a restart
Agent appears in the UI but silently stops sending data
API and UI disagree on agent health
One concrete example:
In one run, the installer completed successfully and the service showed as running, but after a machine restart the agent stopped reporting.
UI-only tests passed.
This flow caught it immediately by correlating service state, restart behavior, and dashboard health.
That issue would have broken downstream E2E tests and likely reached production undetected.
Automation didn’t hide the problem.
It made it visible early and repeatedly.
That’s testing.
What Didn’t Work (And Matters)
Not everything was clean.
Debugging remote failures was slower than local tests
Logs mattered more than assertions
Failures didn’t occur neatly at one layer
Next time:
Structured logging first
Explicit health criteria upfront
Time-boxed exploration before stabilization
POCs are supposed to expose limits, not just successes.
What Changed Because of This POC
This wasn’t theoretical value.
Manual setup time dropped from ~20 minutes to ~5 minutes per run
The flow caught multiple “looks healthy but isn’t” service states
Roughly 30–40% of precondition failures that previously surfaced as flaky UI tests were caught before UI execution
More importantly, the team stopped treating this workflow as “too messy to automate” — and started treating it as a risk worth testing properly.
If You Want to Try This Yourself
Start small:
Identify a workflow users repeat and testers dread
Map the system boundaries involved (machine, service, API, UI)
Automate one boundary honestly
Observe what fails before optimizing anything
Only then decide whether it deserves a framework
Avoid:
Automating around the system
Proving tool skill instead of risk coverage
Treating green execution as confidence
Always ask: what risks does this flow expose that my current tests miss?
Final Words
This POC wasn’t about Python, Playwright, or Copilot.
It was about refusing to accept that
“we can’t automate this”
really meant
“we’ve stopped questioning it.”
So the next time you hear that phrase, pause.
Is it a tooling limitation?
Or hesitation disguised as practicality?
Because at the end of the day:
Tools don’t test. Testers do.
If you found this helpful, stay connected with Life of QA for more real-world testing experiences, tips, and lessons from the journey!



