• Three separate outages hit Sui mainnet on May 28-29, all resolved with no user funds lost or transactions reversed.
  • A gas-charging bug in Sui’s 1.72 release triggered the first two halts via an address balance underflow edge case.
  • The third outage came from a DKG randomness-state bug exposed when validators restarted to apply the earlier fix.

Sui mainnet outage hit three times in under two days. Three separate halts, two distinct bugs, and a cascade of validator restarts that kept exposing new failure points.

The network is back. But what happened between Thursday morning and Friday evening reveals something deeper than a simple software glitch.

When the First Crash Hit

The first outage started around 7am PT on May 28. It lasted until roughly 1:30pm PT, roughly six and a half hours offline. The root cause traced back to Sui’s 1.72 release, which had introduced address balances as a new way for users to store funds and pay for gas without relying on coin objects.

That new feature carried a flaw nobody caught before launch.

The bug lived inside the gas smashing process. When a hybrid gas transaction gets cancelled due to InsufficientFundsForWithdraw, the runtime was still attempting to debit the same funds during the subsequent gas smashing step. Two transactions competing to spend from the same address balance could trigger the cancellation. Then gas smashing would try to spend those already-cancelled funds anyway. A negative delta hit a zero balance. The node crashed.

The fix was straightforward in concept: stop gas smashing when a transaction is cancelled with that specific error. The Sui Core Team proposed it by around noon PT. Enough validators adopted it to bring the network back by 1:30pm.

That fix, though, had a known weakness. The team accepted that risk to restore the network fast.

The Second Halt Nobody Wanted

Friday morning, 5am PT. The network was down again.

A transaction can have more than one reason for cancellation. When a different error code masked the InsufficientFundsForWithdraw code, the Thursday fix got bypassed entirely. Same underflow. Same crash. Different path to get there.

The Sui Foundation’s post-mortem describes it directly: the team was already close to the robust fix when the second outage hit. They finished it in time to propose to validators by around 8am PT. The network came back by 9:40am PT.

Less than two hours of downtime. But the day wasn’t over.

A Third Bug Was Waiting

The network ran normally until around 1:30pm PT on Friday. Then it stopped again.

This one was different. When validators restarted to adopt the Friday morning fix, participation threshold for the next epoch’s distributed key generation protocol wasn’t high enough. DKG failed as designed and disabled itself. The problem was that the failure verdict was never written to disk. Each restart wiped the memory of what happened.

Randomness-dependent transactions sat in a queue waiting for DKG that would never finish. End-of-epoch logic requires draining that queue before it can close. The epoch got stuck.

The fix required two parts, per the Foundation’s official account: patching the persistence bug so DKG status survives restarts, and building a mechanism to force-close a stuck epoch at a coordinated point. The network moved into a new epoch. Randomness was restored. The third outage ended at approximately 7:20pm PT on Friday.

No user funds were lost. No committed transactions were rolled back across any of the three incidents.

What the Sui Foundation Says Needs to Change

The Sui Foundation identified three areas requiring deeper investment coming out of this. End-of-epoch resilience needs to extend beyond the current safe-mode fallback. Gas charging logic, which sits at the intersection of address balances, conservation checks, and the scheduler, needs the same code-quality bar as the Move VM. And failure containment needs a defense-in-depth layer so a future crash-inducing input gets dropped rather than taking the whole network down.

One detail worth noting from the Foundation’s account: AI agents with access to validator logs and cluster state were used during the incident and, according to the Foundation, materially accelerated diagnosis. That’s a quiet but notable data point for where production blockchain operations are heading.

For developers building on Sui and retail holders watching whether the network is production-ready, the three-day sequence is both a stress test and a public audit. The bugs were real. The fixes were fast. What the network does next with the gas charging architecture and epoch resilience will matter more than the downtime itself.