Optus explains big fail
Last week’s high profile Optus outage happened due to a software upgrade, raising questions about its resilience.
In a recent statement, Optus revealed the cause of last week's service outage, attributing it to changes in routing information received from an international peering network during a routine software upgrade at 4:05 am on Wednesday.
The changes exceeded safety levels on key routers, prompting disconnection from the Optus IP Core network for self-protection.
The restoration process involved a significant team effort, requiring physical reconnection or rebooting of routers across multiple Australian sites.
Experts have drawn parallels with the 2016 Australian Population Census outage caused by a similar software upgrade issue, emphasising the importance of implementing a two-man rule to ensure accountability in system changes.
In such a system, one person cannot make a change to the system on their own. There needs to be one person input the change and another check it.
Surprisingly, mobile phones proved more reliable than landlines during the outage, as mobile standards allow for automatic switching between networks for emergency calls.
The Australian Government is exploring mobile roaming between carriers during natural disasters, a concept that could potentially extend to cover other network outages.
Experts say there are key lessons to be learned from the Optus outage, advising users not to rely solely on one service provider for phones and internet.
For those offering safety-critical services, establishing connections to multiple networks is recommended.
More details are accessible here.