<?xml version="1.0" encoding="utf-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
		>
<channel>
	<title>Comments on: Table statistics draft 2, the slow query log</title>
	<atom:link href="http://ebergen.net/wordpress/2010/01/19/table-statistics-draft-2-the-slow-query-log/feed/" rel="self" type="application/rss+xml" />
	<link>http://ebergen.net/wordpress/2010/01/19/table-statistics-draft-2-the-slow-query-log/</link>
	<description>You will probably want some waders, a pickaxe, and one of those hats with a light on it before you go in here.</description>
	<lastBuildDate>Fri, 11 May 2012 07:02:34 +0000</lastBuildDate>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.2</generator>
	<item>
		<title>By: Rick James</title>
		<link>http://ebergen.net/wordpress/2010/01/19/table-statistics-draft-2-the-slow-query-log/comment-page-1/#comment-345784</link>
		<dc:creator>Rick James</dc:creator>
		<pubDate>Fri, 29 Jul 2011 21:45:49 +0000</pubDate>
		<guid isPermaLink="false">http://ebergen.net/wordpress/?p=374#comment-345784</guid>
		<description>Rows_read (and a bunch of other things) are probably in the Percona Xtradb extensions of 5.1.</description>
		<content:encoded><![CDATA[<p>Rows_read (and a bunch of other things) are probably in the Percona Xtradb extensions of 5.1.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Eric Bergen</title>
		<link>http://ebergen.net/wordpress/2010/01/19/table-statistics-draft-2-the-slow-query-log/comment-page-1/#comment-345138</link>
		<dc:creator>Eric Bergen</dc:creator>
		<pubDate>Mon, 25 Jul 2011 23:00:31 +0000</pubDate>
		<guid isPermaLink="false">http://ebergen.net/wordpress/?p=374#comment-345138</guid>
		<description>JSON isn&#039;t a bad idea. I left the values with zeros in them to make it easier for parsing. You&#039;re right in that this is harder for humans. Which version of mysql are you using that has Rows_read? I know Rows_examined gets reset in a bunch of different places including every sub query execution. See https://bugs.launchpad.net/maria/+bug/807198</description>
		<content:encoded><![CDATA[<p>JSON isn&#8217;t a bad idea. I left the values with zeros in them to make it easier for parsing. You&#8217;re right in that this is harder for humans. Which version of mysql are you using that has Rows_read? I know Rows_examined gets reset in a bunch of different places including every sub query execution. See <a href="https://bugs.launchpad.net/maria/+bug/807198" rel="nofollow">https://bugs.launchpad.net/maria/+bug/807198</a></p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Rick James</title>
		<link>http://ebergen.net/wordpress/2010/01/19/table-statistics-draft-2-the-slow-query-log/comment-page-1/#comment-345131</link>
		<dc:creator>Rick James</dc:creator>
		<pubDate>Mon, 25 Jul 2011 21:33:21 +0000</pubDate>
		<guid isPermaLink="false">http://ebergen.net/wordpress/?p=374#comment-345131</guid>
		<description>How can Rows_read be smaller than Rows_examined?

# Time: 110725  6:35:04
# User@Host: ...[...] @ ....yahoo.com [...]
# Thread_id: 34392211  Schema: ...
# Query_time: 2.002966  Lock_time: 0.000137  Rows_sent: 100  Rows_examined: 200  Rows_affected: 0  Rows_read: 1
# Bytes_sent: 161592  Tmp_tables: 0  Tmp_disk_tables: 0  Tmp_table_sizes: 0
SET timestamp=1311600904;
SELECT * FROM ... w INNER JOIN ... c ON w.docid = c.docid WHERE w.docid IN ( ... );

(There were 100 items in the IN clause.)</description>
		<content:encoded><![CDATA[<p>How can Rows_read be smaller than Rows_examined?</p>
<p># Time: 110725  6:35:04<br />
# User@Host: &#8230;[...] @ &#8230;.yahoo.com [...]<br />
# Thread_id: 34392211  Schema: &#8230;<br />
# Query_time: 2.002966  Lock_time: 0.000137  Rows_sent: 100  Rows_examined: 200  Rows_affected: 0  Rows_read: 1<br />
# Bytes_sent: 161592  Tmp_tables: 0  Tmp_disk_tables: 0  Tmp_table_sizes: 0<br />
SET timestamp=1311600904;<br />
SELECT * FROM &#8230; w INNER JOIN &#8230; c ON w.docid = c.docid WHERE w.docid IN ( &#8230; );</p>
<p>(There were 100 items in the IN clause.)</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Rick James</title>
		<link>http://ebergen.net/wordpress/2010/01/19/table-statistics-draft-2-the-slow-query-log/comment-page-1/#comment-345130</link>
		<dc:creator>Rick James</dc:creator>
		<pubDate>Mon, 25 Jul 2011 21:29:03 +0000</pubDate>
		<guid isPermaLink="false">http://ebergen.net/wordpress/?p=374#comment-345130</guid>
		<description>Well, why not make it JSON, with some extra whitespace?  Lump all the key:value stuff in the JSON, perhaps as a single line.

Further, skip any values that are &quot;0&quot; or &quot;no&quot;, as wasting space.  This would assist in UPDATEs versus SELECTs -- different Rows_* are applicable to the two query types.</description>
		<content:encoded><![CDATA[<p>Well, why not make it JSON, with some extra whitespace?  Lump all the key:value stuff in the JSON, perhaps as a single line.</p>
<p>Further, skip any values that are &#8220;0&#8243; or &#8220;no&#8221;, as wasting space.  This would assist in UPDATEs versus SELECTs &#8212; different Rows_* are applicable to the two query types.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Baron Schwartz</title>
		<link>http://ebergen.net/wordpress/2010/01/19/table-statistics-draft-2-the-slow-query-log/comment-page-1/#comment-266346</link>
		<dc:creator>Baron Schwartz</dc:creator>
		<pubDate>Fri, 19 Feb 2010 02:20:39 +0000</pubDate>
		<guid isPermaLink="false">http://ebergen.net/wordpress/?p=374#comment-266346</guid>
		<description>Splunk isn&#039;t really a tool I&#039;d aim at the slow query log; I&#039;d point it at the error log.  The slow query log format is pretty regular except for the insane exceptions it already has, and I&#039;d suggest trying to stick to its basic format if at all possible.  Again, it&#039;s already far harder to parse than it seems, and building a FAST parser is really hard.

As a more verbose alternative I could suggest this:

# Time: 100119 20:27:16
# User@Host: [ebergen] @ localhost []
# Query_time: 6 Lock_time: 0 Rows_sent: 6 Rows_examined: 14
# Row_stats: sbtest.foo:rows_read=18,rows_changed=0...;sbtest.bar:rows_read=...
# Index_stats: sbtest.bar.u:rows_read=6
select * from foo a, bar b where b.u=4 order by sleep(1);</description>
		<content:encoded><![CDATA[<p>Splunk isn&#8217;t really a tool I&#8217;d aim at the slow query log; I&#8217;d point it at the error log.  The slow query log format is pretty regular except for the insane exceptions it already has, and I&#8217;d suggest trying to stick to its basic format if at all possible.  Again, it&#8217;s already far harder to parse than it seems, and building a FAST parser is really hard.</p>
<p>As a more verbose alternative I could suggest this:</p>
<p># Time: 100119 20:27:16<br />
# User@Host: [ebergen] @ localhost []<br />
# Query_time: 6 Lock_time: 0 Rows_sent: 6 Rows_examined: 14<br />
# Row_stats: sbtest.foo:rows_read=18,rows_changed=0&#8230;;sbtest.bar:rows_read=&#8230;<br />
# Index_stats: sbtest.bar.u:rows_read=6<br />
select * from foo a, bar b where b.u=4 order by sleep(1);</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Eric Bergen</title>
		<link>http://ebergen.net/wordpress/2010/01/19/table-statistics-draft-2-the-slow-query-log/comment-page-1/#comment-266081</link>
		<dc:creator>Eric Bergen</dc:creator>
		<pubDate>Sun, 14 Feb 2010 02:57:04 +0000</pubDate>
		<guid isPermaLink="false">http://ebergen.net/wordpress/?p=374#comment-266081</guid>
		<description>The database.table notation is common enough that I think people will recognize it without having to specify separate identifiers for each. With that in mind I realize it&#039;s easier to parse something with tools like splunk if the format is database=foo table=bar. I&#039;m leaning towards something like:
Row Stats: sbtest.foo rows_read=18 rows_changed=0 rows_changed_x_index=0</description>
		<content:encoded><![CDATA[<p>The database.table notation is common enough that I think people will recognize it without having to specify separate identifiers for each. With that in mind I realize it&#8217;s easier to parse something with tools like splunk if the format is database=foo table=bar. I&#8217;m leaning towards something like:<br />
Row Stats: sbtest.foo rows_read=18 rows_changed=0 rows_changed_x_index=0</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Ben Smith</title>
		<link>http://ebergen.net/wordpress/2010/01/19/table-statistics-draft-2-the-slow-query-log/comment-page-1/#comment-265394</link>
		<dc:creator>Ben Smith</dc:creator>
		<pubDate>Tue, 02 Feb 2010 06:32:17 +0000</pubDate>
		<guid isPermaLink="false">http://ebergen.net/wordpress/?p=374#comment-265394</guid>
		<description>Reducing the log output to &quot;Rows Stats: sbtest.foo 18 0 0&quot; or &quot;# Row_stats: sbtest.foo=18,0,0&quot; makes it hard for someone who doesn&#039;t know what those stats are to pick up and parse them.  It also makes it harder for someone looking at the logs manually to understand them.  The cost of storing some extra text is minimal, especially with the increasing size of storage space these days.  I would suggest being more verbose, any extra text can be parsed out by whatever script/tool you are using to parse it.  Also, if you are using a tool not familiar with the logs, being more verbose can help it understand it without having to do any configuration(i.e. Splunk).  To that end, doing something like this instead would be more effective:

# Time: 100119 20:27:16
# User@Host: [ebergen] @ localhost []
# Query_time: 6 Lock_time: 0 Rows_sent: 6 Rows_examined: 14
# Rows Stats: database=sbtest table=foo rows_read=18 rows_changed=0 rows_changed_x_index=00
# Rows Stats: database=sbtest table=bar rows_read=15 rows_changed=3 rows_changed_x_index=3
# Index Stats: database=sbtest table=bar.u index_rows_read=6
select * from foo a, bar b where b.u=4 order by sleep(1);</description>
		<content:encoded><![CDATA[<p>Reducing the log output to &#8220;Rows Stats: sbtest.foo 18 0 0&#8243; or &#8220;# Row_stats: sbtest.foo=18,0,0&#8243; makes it hard for someone who doesn&#8217;t know what those stats are to pick up and parse them.  It also makes it harder for someone looking at the logs manually to understand them.  The cost of storing some extra text is minimal, especially with the increasing size of storage space these days.  I would suggest being more verbose, any extra text can be parsed out by whatever script/tool you are using to parse it.  Also, if you are using a tool not familiar with the logs, being more verbose can help it understand it without having to do any configuration(i.e. Splunk).  To that end, doing something like this instead would be more effective:</p>
<p># Time: 100119 20:27:16<br />
# User@Host: [ebergen] @ localhost []<br />
# Query_time: 6 Lock_time: 0 Rows_sent: 6 Rows_examined: 14<br />
# Rows Stats: database=sbtest table=foo rows_read=18 rows_changed=0 rows_changed_x_index=00<br />
# Rows Stats: database=sbtest table=bar rows_read=15 rows_changed=3 rows_changed_x_index=3<br />
# Index Stats: database=sbtest table=bar.u index_rows_read=6<br />
select * from foo a, bar b where b.u=4 order by sleep(1);</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Eric Bergen</title>
		<link>http://ebergen.net/wordpress/2010/01/19/table-statistics-draft-2-the-slow-query-log/comment-page-1/#comment-264852</link>
		<dc:creator>Eric Bergen</dc:creator>
		<pubDate>Sat, 23 Jan 2010 23:39:38 +0000</pubDate>
		<guid isPermaLink="false">http://ebergen.net/wordpress/?p=374#comment-264852</guid>
		<description>Thanks Baron, I&#039;ll update the patch to use the second format and repost it.</description>
		<content:encoded><![CDATA[<p>Thanks Baron, I&#8217;ll update the patch to use the second format and repost it.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Baron Schwartz</title>
		<link>http://ebergen.net/wordpress/2010/01/19/table-statistics-draft-2-the-slow-query-log/comment-page-1/#comment-264624</link>
		<dc:creator>Baron Schwartz</dc:creator>
		<pubDate>Wed, 20 Jan 2010 23:42:55 +0000</pubDate>
		<guid isPermaLink="false">http://ebergen.net/wordpress/?p=374#comment-264624</guid>
		<description>Actually now that I think about it, the second format I showed (where the name: part of the name:value pair does NOT vary) is going to be far more efficient for Maatkit, which self-generates code for efficiency after it&#039;s seen enough samples, and would fail to recognize arbitrarily varying things like Row_stats/variable.variable.  So consider this a vote for Row_stats: as a prefix, and something parse-able as a value :-)</description>
		<content:encoded><![CDATA[<p>Actually now that I think about it, the second format I showed (where the name: part of the name:value pair does NOT vary) is going to be far more efficient for Maatkit, which self-generates code for efficiency after it&#8217;s seen enough samples, and would fail to recognize arbitrarily varying things like Row_stats/variable.variable.  So consider this a vote for Row_stats: as a prefix, and something parse-able as a value <img src='http://ebergen.net/wordpress/wp-includes/images/smilies/icon_smile.gif' alt=':-)' class='wp-smiley' /> </p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Baron Schwartz</title>
		<link>http://ebergen.net/wordpress/2010/01/19/table-statistics-draft-2-the-slow-query-log/comment-page-1/#comment-264623</link>
		<dc:creator>Baron Schwartz</dc:creator>
		<pubDate>Wed, 20 Jan 2010 23:39:26 +0000</pubDate>
		<guid isPermaLink="false">http://ebergen.net/wordpress/?p=374#comment-264623</guid>
		<description>Eric,

Great stuff.  Can I suggest a different format for the slow query log.  Parsing slow query logs is already far too complex.  (See the test suite for Maatkit...)  If we keep the format parseable by Maatkit&#039;s existing parser it&#039;ll be great, although I appreciate that you&#039;re putting a lot of stuff into the log here and it&#039;s a little complex.

The slow query log format is key:value entries separated by spaces.  This is both the easiest and the most efficient to parse.  So I would suggest something like this:

# Time: 100119 20:27:16
# User@Host: [ebergen] @ localhost []
# Query_time: 6 Lock_time: 0 Rows_sent: 6 Rows_examined: 14
# Row_stats/sbtest.foo: 18,0,0 Row_stats/sbtest.bar: 15,3,3
# Index_stats/sbtest.bar.u: 6
select * from foo a, bar b where b.u=4 order by sleep(1);

The :value part of the key:value pair is not going to be an atomic value and will need to be decomposed for analysis, but I think that&#039;s OK.  At least the general line format remains the same here.

Alternatively, I could suggest this:

# Time: 100119 20:27:16
# User@Host: [ebergen] @ localhost []
# Query_time: 6 Lock_time: 0 Rows_sent: 6 Rows_examined: 14
# Row_stats: sbtest.foo=18,0,0;sbtest.bar=15,3,3
# Index_stats: sbtest.bar.u=6
select * from foo a, bar b where b.u=4 order by sleep(1);

If you want, before you bake the format fully we can write test cases for the Maatkit parser.  I would be very interested in adding analytical support for this to mk-query-digest.  The code that does the aggregation is pretty complex (for speed) and it would be nice if it&#039;s as easy as possible to support.</description>
		<content:encoded><![CDATA[<p>Eric,</p>
<p>Great stuff.  Can I suggest a different format for the slow query log.  Parsing slow query logs is already far too complex.  (See the test suite for Maatkit&#8230;)  If we keep the format parseable by Maatkit&#8217;s existing parser it&#8217;ll be great, although I appreciate that you&#8217;re putting a lot of stuff into the log here and it&#8217;s a little complex.</p>
<p>The slow query log format is key:value entries separated by spaces.  This is both the easiest and the most efficient to parse.  So I would suggest something like this:</p>
<p># Time: 100119 20:27:16<br />
# User@Host: [ebergen] @ localhost []<br />
# Query_time: 6 Lock_time: 0 Rows_sent: 6 Rows_examined: 14<br />
# Row_stats/sbtest.foo: 18,0,0 Row_stats/sbtest.bar: 15,3,3<br />
# Index_stats/sbtest.bar.u: 6<br />
select * from foo a, bar b where b.u=4 order by sleep(1);</p>
<p>The :value part of the key:value pair is not going to be an atomic value and will need to be decomposed for analysis, but I think that&#8217;s OK.  At least the general line format remains the same here.</p>
<p>Alternatively, I could suggest this:</p>
<p># Time: 100119 20:27:16<br />
# User@Host: [ebergen] @ localhost []<br />
# Query_time: 6 Lock_time: 0 Rows_sent: 6 Rows_examined: 14<br />
# Row_stats: sbtest.foo=18,0,0;sbtest.bar=15,3,3<br />
# Index_stats: sbtest.bar.u=6<br />
select * from foo a, bar b where b.u=4 order by sleep(1);</p>
<p>If you want, before you bake the format fully we can write test cases for the Maatkit parser.  I would be very interested in adding analytical support for this to mk-query-digest.  The code that does the aggregation is pretty complex (for speed) and it would be nice if it&#8217;s as easy as possible to support.</p>
]]></content:encoded>
	</item>
</channel>
</rss>

