The Sardaukar of Vendor Support: How to Survive an Escalation
There is a point in every support ticket where you realize you are no longer talking to a help desk.
You are negotiating with the Padishah Emperor.
The portal says your case is “under investigation.” The engineer says the issue cannot be reproduced. Your account manager says they are “monitoring the situation.” Meanwhile, production is coughing up blood in the desert and your users are asking why everything is slow.
I have been through this ritual enough times to recognize the signs.
You have entered the Sardaukar phase of vendor support.
The first assault
This particular battle started with a firewall cluster that had developed a charming new habit: dropping sessions without logging why.
Not all sessions. That would have been useful.
Just enough of them to break authentication, irritate users, and make the problem look intermittent. The worst kind of failure. The kind that makes you question whether the system is broken or whether you are simply getting old.
I opened a priority case and uploaded everything:
Packet captures
Configuration exports
Software version
Timestamps
Traffic samples
A diagram
Logs from both sides
A written explanation that could have been used to teach a community-college course
The first response arrived four hours later.
“Please reboot the appliance.”
Fantastic. The Emperor has spoken.
Know which planet you are fighting on
Vendor support is divided into tiers, plans, severities, entitlements, response targets, and internal rules that customers usually discover during the outage.
Some support plans provide technical case access. Others provide faster responses, named technical contacts, or dedicated incident services. AWS, for example, publishes different target response times and capabilities according to support level and severity. Paying for a support contract does not automatically mean every case receives immediate senior engineering attention.
This is why you need to know the contract before the worms arrive.
Before a serious incident, document:
Your support plan
Covered systems
Severity definitions
Response targets
Escalation contacts
Account manager
Renewal date
Any requirement for logs or diagnostic bundles
Whether after-hours support is included
Do not wait until 2 a.m. to learn that your “premium” agreement means someone will email you tomorrow.
Do not attack the first soldier you meet
The Tier 1 technician did not design the product. They did not write your contract. They are not personally preventing the senior engineer from joining your call.
They are following the script assigned to them.
I know this because I have made the mistake of getting angry at the first person who answered. It feels good for about thirty seconds. Then you realize you have insulted the only human currently touching your ticket.
The better move is to make escalation easy.
Give them a compact evidence package:
Business impact
Start time
Systems affected
What changed
What has been ruled out
Steps already taken
Exact request
My request eventually became:
Production authentication traffic is being interrupted across two sites. We have reproduced the fault across both cluster members and ruled out the upstream circuit. Please escalate this to the team responsible for session handling in the current firmware branch.
That is much harder to answer with “please reboot.”
Severity is a weapon, but use it honestly
Every engineer has been tempted to mark a case as the highest possible severity because the vendor is moving too slowly.
Do this carelessly and support learns not to trust you.
A real severity framework should consider user impact, scope, duration, available workarounds, and business criticality. Atlassian’s incident guidance similarly recommends defining escalation according to severity, duration, scope, and the skills required to resolve the problem.
If twenty users are mildly inconvenienced, it is not the fall of House Atreides.
If the entire company cannot authenticate and no workaround exists, release the crysknife.
Build your own war room
A vendor ticket is not an incident-management process.
Your team still needs someone coordinating technical work, someone recording decisions, and someone updating the business. Google’s incident-management guidance separates the work of fixing the system from the work of coordinating responders and communication. Both are necessary.
For this incident, we kept our own timeline:
09:12: First confirmed authentication failure
09:23: Upstream carrier cleared
09:41: Issue reproduced on secondary cluster member
10:05: Packet capture uploaded
10:48: Vendor requested reboot
11:02: Escalation requested
12:17: Senior engineer joined
12:36: Known firmware defect identified
Once the senior engineer arrived, the problem was understood in nineteen minutes.
We had spent nearly three hours trying to reach the person who already knew the answer.
Classic empire stuff.
The ticket is not over when service returns
The workaround was to disable a particular acceleration feature until a patched firmware release became available.
That restored service. It did not finish the work.
Afterward, I asked for:
The defect identifier
Affected versions
Permanent fix
Release target
Written risk of the workaround
Confirmation that our case would remain attached to the defect
A follow-up date
Without those details, a workaround becomes permanent through inertia. Six months later, someone asks why a feature is disabled and nobody remembers the siege of Arrakeen.
Doug’s rules for dealing with the Sardaukar
I have learned five things:
Know the support contract before the outage.
Bring evidence, not a paragraph that says “network broken.”
Ask for functional escalation, not just a more senior title.
Run your own incident process while the vendor runs its ticket.
Get the permanent resolution in writing.
Vendor support can be maddening. It can feel rigid, ceremonial, and designed to exhaust anyone who did not arrive with a royal seal.
But the Sardaukar are not invincible.
You just need the records, the escalation path, and enough coffee to survive the desert.