RC rolls are regularly scheduled for 7AM Pacific (SLT) every Wednesday. While conducting one of these deployments, we received initial reports of issues relating to script behavior at 8:30AM. We immediately started investigating, and by 9AM it was clear that we needed to stop the roll, because the issue was pervasive and clearly related to the new changes. As we do in these situations, we declared an “Incident” and an Incident Commander took charge.
By 9:30 we started a rollback, reverting the affected simulators to their previous version. We also evaluated what additional actions we would need to take, as it was unclear that a rollback alone would start the scripts which had stopped. This quickly proved to be the case, and we came up with a new plan - one that would ensure that scripts would perform as expected going forward, but possibly undo the changes our residents had been making since the time we had introduced the bug. Although this meant more downtime, it prevented further content loss, and proved to be the best way to put the grid back in order. The team came up with a quick, clean, and efficient way to achieve this and get everyone back on track.
By 12PM the decision was made to take this direction. By 1:15PM the code was complete, by 1:40PM we confirmed that it worked. By 2:25PM all regions were brought back up.
The report in full, signed by Grumpity Linden "and the Second Life team," can be found (here).
"I’m grateful to know that while we may make mistakes, we will not sweep them under the rug, nor look for someone to blame - we will come together and make it right."
However, I and almost all my friends (on various viewers), are STILL experiencing numerous lag-related problems to this day, two weeks after Feb 1. Is Linden Lab unaware of all this? Help!!
ReplyDelete