<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Altentee &#187; Performance &#38; Test Automation Experts &#187; ec2</title>
	<atom:link href="http://altentee.com/tag/ec2/feed/" rel="self" type="application/rss+xml" />
	<link>http://altentee.com</link>
	<description>Performance and Test Automation Experts</description>
	<lastBuildDate>Sat, 12 Jun 2010 00:35:08 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.9</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>Improved EBS snapshots on Amazon EC2</title>
		<link>http://altentee.com/2010/improved-ebs-snapshots-on-amazon-ec2/</link>
		<comments>http://altentee.com/2010/improved-ebs-snapshots-on-amazon-ec2/#comments</comments>
		<pubDate>Mon, 04 Jan 2010 04:33:12 +0000</pubDate>
		<dc:creator>Tim Koopmans</dc:creator>
				<category><![CDATA[90kts]]></category>
		<category><![CDATA[ec2]]></category>

		<guid isPermaLink="false">http://altentee.com/?p=686</guid>
		<description><![CDATA[<p>So I&#8217;ve been working on a system where the MySQL instance on EC2 sporadically locks up (mysqld is zombied) during a snapshot process =)</p>
<p>Essentially the EC2 snapshot is triggered like this:</p>

system(&#34;xfs_freeze -f /vol&#34;)                        and die;
system(&#34;ec2-create-snapshot $vol -K $key -C $crt &#34;) and die;
system(&#34;xfs_freeze -u /vol&#34;)                        and die;

<p>This method is based on advice from here</p>
<p>Notice I [...]]]></description>
			<content:encoded><![CDATA[<p>So I&#8217;ve been working on a system where the <strong>MySQL</strong> instance on <strong>EC2</strong> sporadically locks up (mysqld is zombied) during a snapshot process =)</p>
<p>Essentially the <strong>EC2</strong> snapshot is triggered like this:</p>

<div class="wp_syntax"><div class="code"><pre class="shell" style="font-family:monospace;">system(&quot;xfs_freeze -f /vol&quot;)                        and die;
system(&quot;ec2-create-snapshot $vol -K $key -C $crt &quot;) and die;
system(&quot;xfs_freeze -u /vol&quot;)                        and die;</pre></div></div>

<p>This method is based on advice from <a href="http://developer.amazonwebservices.com/connect/entry.jspa?externalID=1663">here</a></p>
<p>Notice I am doing an <strong>xfs_freeze</strong> of the entire volume on which the mysql data sits. It is intended to be used with volume managers and hardware RAID devices that support the creation of snapshots such as EBS.<br />
<span id="more-686"></span></p>
<blockquote><p>The -f flag requests the specified XFS filesystem to be frozen from new modifications. When this is selected, all ongoing transactions in the filesystem are allowed to complete, new write system calls are halted, other calls which modify the filesystem are halted, and all dirty data, metadata, and log information are written to disk. Any process attempting to write to the frozen filesystem will block waiting for the filesystem to be unfrozen.</p>
<p>Note that even after freezing, the on-disk filesystem can contain information on files that are still in the process of unlinking. These files will not be unlinked until the filesystem is unfrozen or a clean mount of the snapshot is complete.</p>
<p>The -u flag is used to un-freeze the filesystem and allow operations to continue. Any filesystem modifications that were blocked by the freeze are unblocked and allowed to complete.</p></blockquote>
<p>EC2 recommends this for <strong>ec2-create-snapshot</strong>:</p>
<blockquote><p>When taking a snapshot of a file system, we recommend unmounting it first. This ensures the file system metadata is in a consistent state, that the &#8216;mounted indicator&#8217; is cleared, and that all applications using that file system are stopped and in a consistent state. Some file systems, such as xfs, can freeze and unfreeze activity so a snapshot can be made without unmounting.</p></blockquote>
<p>Clearly we don&#8217;t want to unmount the whole drive, so freezing the XFS volume is our best bet.</p>
<p>On a couple of occassions, we&#8217;ve seen the whole mysql server crash when a backup was taking place. The first symptoms displayed in logs is a long semaphore wait. This message is repeated for different threads:</p>
<pre>mysqld[26852]: InnoDB: Warning: a long semaphore wait:</pre>
<pre>mysqld[26852]: --Thread 46912644213072 has waited at ./../include/trx0sys.ic line 101 for 241.00 seconds the semaphore:</pre>
<pre>mysqld[26852]: X-lock on RW-latch at 0x2aaaaf01cc08 created in file buf0buf.c line 497</pre>
<pre>mysqld[26852]: a writer (thread id 46912644213072) has reserved it in mode  wait exclusive</pre>
<p>After about 10 minutes of this behaviour the server crashes. This is because 101 threads have made 101 connections where the max_connections=100. This only occurs whenever we are doing a snapshot in the background which makes me thing its a deadlock condition on the underlying filesystem.</p>
<pre>mysqld[26852]: InnoDB: We intentionally crash the server, because it appears to be hung.</pre>
<pre>mysqld[26852]: 091207 20:16:41InnoDB: Assertion failure in thread 46912625965392 in file srv0srv.c line 2097</pre>
<pre>mysqld[26852]: InnoDB: We intentionally generate a memory trap.</pre>
<pre>mysqld[26852]: InnoDB: Submit a detailed bug report to http://bugs.mysql.com.</pre>
<pre>mysqld[26852]: InnoDB: If you get repeated assertion failures or crashes, even</pre>
<pre>mysqld[26852]: InnoDB: immediately after the mysqld startup, there may be</pre>
<pre>mysqld[26852]: InnoDB: corruption in the InnoDB tablespace. Please refer to</pre>
<pre>mysqld[26852]: InnoDB: http://dev.mysql.com/doc/refman/5.0/en/forcing-recovery.html</pre>
<pre>mysqld[26852]: InnoDB: about forcing recovery.</pre>
<pre>mysqld[26852]: 091207 20:16:41 - mysqld got signal 11 ;</pre>
<pre>mysqld[26852]: This could be because you hit a bug. It is also possible that this binary</pre>
<pre>mysqld[26852]: or one of the libraries it was linked against is corrupt, improperly built,</pre>
<pre>mysqld[26852]: or misconfigured. This error can also be caused by malfunctioning hardware.</pre>
<pre>mysqld[26852]: We will try our best to scrape up some info that will hopefully help diagnose</pre>
<pre>mysqld[26852]: the problem, but since we have already crashed, something is definitely wrong</pre>
<pre>mysqld[26852]: and this may fail.</pre>
<p>In panic mode, we restarted the whole server which was able to then recover from this error. But is there a more graceful recovery open to us?</p>
<p><strong>Alestic</strong> have released a revised version of the ec2-create-snapshot called<strong> <a href="http://alestic.com/2009/09/ec2-consistent-snapshot">ec2-consistent-snapshot </a></strong></p>
<blockquote><p>Here are some of the ways in which the ec2-consistent-snapshot program has improved over the original:</p>
<p>Command line options for passing in AWS keys, MySQL access information, and more.<br />
Can be run with or without a MySQL database on the file system. This lets you use the command to initiate snapshots for any EBS volume.<br />
Can be used with or without XFS file systems, though if you don’t use XFS, you run the risk of not having a consistent file system on EBS volume restore.<br />
Instead of using the painfully slow ec2-create-snapshot command written in Java, this Perl program accesses the EC2 API directly with orders of magnitude speed improvement.<br />
A preliminary FLUSH is performed on the MySQL database before the FLUSH WITH READ LOCK. This preparation reduces the total time the tables are locked.<br />
A preliminary sync is performed on the XFS file system before the xfs_freeze. This preparation reduces the total time the file system is locked.<br />
The MySQL LOCK now has timeouts and retries around it. This prevents horrible blocking interactions between the database lock, long running queries, and normal transactions. The length of the timeout and the number of retries are configurable with command line options.<br />
The MySQL FLUSH is done in such a way that the statement does not propagate through to slave databases, negatively impacting their performance and/or causing negative blocking interactions with long running queries.<br />
Cleans up MySQL and XFS locks if it is interrupted, if a timeout happens, or if other errors occur. This prevents a number of serious locking issues when things go wrong with the environment or EC2 API.<br />
Can snapshot EBS volumes in a region other than the default (e.g., eu-west-1).<br />
Can initiate snapshots of multiple EBS volumes at the same time while everything is consistently locked. This has been used to create consistent snapshots of RAIDed EBS volumes.</p>
<ul></ul>
</blockquote>
<p>Here&#8217;s hoping that this release fixes the problems with locking of the filesystem! Notice we don&#8217;t attempt to flush and lock since we are using the <strong>InnoDB</strong> engine.</p>
<p>Revised snapshot code looks like this:</p>

<div class="wp_syntax"><div class="code"><pre class="shell" style="font-family:monospace;"># create snapshot
system(&quot;xfs_freeze -f /vol&quot;)                        and die;
system(&quot;ec2-consistent-snapshot --aws-access-key-id $key --aws-secret-access-key $secret --xfs-filesystem /vol $vol&quot;) and die;
system(&quot;xfs_freeze -u /vol&quot;)                        and die;</pre></div></div>

]]></content:encoded>
			<wfw:commentRss>http://altentee.com/2010/improved-ebs-snapshots-on-amazon-ec2/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
	</channel>
</rss>
