All posts from in,

Amazon explains database glitch that impacted big customers

It all started with a minor network disruption but the problem spiraled from there, according to an Amazon Web Services post mortem.

Amazon Web Services has posted><param name="width" value="550"/><param name="height" value="310"/><param name="playerID" value="3160175193001"/><param name="playerKey" value="AQ~~,AAAB668kGak~,LMlvL4u4ShOTHD9z00VquajMOcH97tcW"/><param name="isVid" value="true"/><param name="isUI" value="true"/><param name="videoSmoothing" value="true"/><param name="seamlessTabbing" value="false"/><param name="swliveconnect" value="true"/><param name="dynamicStreaming" value="true"/><param name="autoStart" value="false"/><param name="@videoPlayer" value="4324494644001"/><param name="linkBaseURL" value=""/><param name="includeAPI" value="true"/><param name="templateLoadHandler" value="Fortune_onTemplateLoad"/><param name="templateReadyHandler" value="Fortune_onTemplateReady"/><param name="wmode" value="opaque"/><param name="adServerURL" value=""/></object> </div> <script type="text/javascript"> brightcove.createExperiences(); </script> Subscribe to Data Sheet, Fortune’s daily newsletter on the business of technology."> an explanation of what happened at its big U.S. East data center that caused customers like Netflix to experience issues on Sunday.

According to the post-mortem of the “service event” (you have to love that term) a brief network disruption at 2:19 a.m. PDT affected a subset of the servers running Amazon’s AMZN -1.82% DynamoDB database service which stores and maintains data tables for customers. Each table is divvied up into partitions, containing a portion of the table data and those partitions, in turn, are parceled out to many servers to provide fast access and to allow data replication.

Per the post:

The specific assignment of a group of partitions to a given server is called a “membership.” The membership of a set of table/partitions within a server is managed by DynamoDB’s internal metadata service. The metadata service is internally replicated and runs across multiple data centers. Storage servers hold the actual table data within a partition and need to periodically confirm that they have the...