Archive for the ‘MariaDB’ Category.

Second update of modifying table statistics in MariaDB

Since my last post I’ve changed how the table statistics work quite a bit in MariaDB. I ran into a few problems with my original changes. In the TiVo 5.0 patch the show table_statistics command chose from one of three hash tables to read from depending on the flags. There is a global hash table for global stats and two in the thd object for session and query stats. Each time a non show query is executed the query statistics are reset. In 5.1 the implementation of show command changed from reading arbitrary data structures to constructing queries to run against information_schema tables. The information_schema tables are constructed on the fly, placed into a temporary table and have the select resulting from the show command executed on them. This works ok for the show commands but broke my new information schema tables.

As part of porting my changes into mariadb 5.2 I added a few new information schema tables called QUERY_TABLE_STATISTICS, QUERY_INDEX_STATISTICS, SESSION_TABLE_STATISTICS, and SESSION_INDEX_STATISTICS. For the part these tables query the hash tables from the 5.0 implementation. This worked fine for the show commands in my previous update. It broke when querying the QUERY_TABLE_STATISTICS directly. In 5.0 I reset the per query table statistics on every query that isn’t a SHOW query. This works fine except that SELECT * FROM QUERY_TABLE_STATISTICS reset the query statistics that the select query was supposed to retrieve.

After a conversation with serg in #mariadb on freenode we came up with a plan to fix the QUERY_TABLE_STATISTICS and QUERY_INDEX_STATISTICS tables. Instead of tracking the most recent query the plan is to track the most recent N queries tracked with a QUERY_ID. I realized this is very similar to how SHOW PROFILES worked. I decided to try to integrate the query statistics in with the profile. This seemed like a good idea until I realized that I want to enable the slow query log statistics without the overhead of leaving profiling on in production.

All of the profiling methods are controlled by the profiling variable. All of the table, index, and user statistics are controlled by the userstat variable. The profiling and my changes to the table statistics both use a QUERY_ID to show a list of queries. The problem is that depending on when the profiling and userstat variables are enabled the query_ids can be inconsistent. For example if a user enables userstat, executes a query, then enables profiling and executes another query the table statistics query_id 1 and the profiling query_id 1 will be different.

Every show command that uses a information schema table puts rows into a temporary table. I’ve also added temporary table tracking into the table statistics. This has an interesting side effect when combined with per query statistics tracking. Queries are only put into the queue of queries for query statistics when they read/write rows. The query_statistics_history_size controls how many queries to keep stats for. Each show command now uses a temporary table so the show commands are now tracked using row statistics. Here is an example using show variables.

mysql> set query_statistics_history_size=5;
Query OK, 0 rows affected (0.00 sec)

mysql> show variables;
+——————————————-+—————————————————————————————————————-+
| Variable_name | Value |
+——————————————-+—————————————————————————————————————-+
| aria_block_size | 8192
………
| warning_count | 0 |
+——————————————-+—————————————————————————————————————-+
395 rows in set (0.00 sec)

mysql> show query table_statistics;
+———-+————–+————+———–+————–+————————-+
| Query_id | Table_schema | Table_name | Rows_read | Rows_changed | Rows_changed_x_#indexes |
+———-+————–+————+———–+————–+————————-+
| 8 | #temp# | #temp# | 395 | 395 | 395 |
+———-+————–+————+———–+————–+————————-+
1 row in set (0.00 sec)

mysql> show query table_statistics;
+———-+————–+————+———–+————–+————————-+
| Query_id | Table_schema | Table_name | Rows_read | Rows_changed | Rows_changed_x_#indexes |
+———-+————–+————+———–+————–+————————-+
| 8 | #temp# | #temp# | 395 | 395 | 395 |
| 9 | #temp# | #temp# | 1 | 1 | 1 |
+———-+————–+————+———–+————–+————————-+
2 rows in set (0.00 sec)

mysql> show query table_statistics;
+———-+————–+————+———–+————–+————————-+
| Query_id | Table_schema | Table_name | Rows_read | Rows_changed | Rows_changed_x_#indexes |
+———-+————–+————+———–+————–+————————-+
| 8 | #temp# | #temp# | 395 | 395 | 395 |
| 9 | #temp# | #temp# | 1 | 1 | 1 |
| 10 | #temp# | #temp# | 2 | 2 | 2 |
+———-+————–+————+———–+————–+————————-+
3 rows in set (0.00 sec)

See how each new show command adds in another query into the queue of tracked queries? While technically correct I don’t think this is ideal. I’m open to suggestions on how to keep the temp table tracking useful while not polluting the list of queries with unnecessary results.

I went back to #mariadb again and we came up with a single unified query_id per thread that both show profiles and the table statistics can use. This unifies the query ids with the down side that a lot of ids can be wasted. Doing set profiling=1, executing a query, and running show profiles isn’t guaranteed to use query_id 1 like it did before. This is where I’m looking to the community to decide how to handle the two different commands and query ids.

The main reason for tracking the per query statistics is to dump them to the slow query log. If userstat is enabled and a query is written to the slow query log it will include two new comments that look like:

# Time: 110712 13:12:16
# User@Host: [ebergen] @ localhost []
# Thread_id: 1 Schema: test QC_hit: No
# Query_time: 64.666276 Lock_time: 0.000113 Rows_sent: 10 Rows_examined: 719488
# Row_Stats: test:rows_read=677984,rows_changed=0,rows_changed_x_indexes=0;#temp#:rows_read=41504,rows_changed=41494,rows_changed_x_indexes=41494;
# Index_Stats:
SET timestamp=1310501536;
select * from t2 group by u order by u desc limit 10;

Things that are missing or wrong so far. The first thing is that I don’t have a query similar to show profiles for queries with their ids. I don’t want to duplicate show profiles for statistics. I’m open to suggestion on how to unify the profiling and table statistics. Some of the structures for profiling and query stats are similar. I think they can be unified but this is more work than I want to put into the tivo branch. If mariadb is willing to accept this kind of feature I can work on unifying them.

Here is an example of where I’m stuck with show profiles vs show query statistics.

mysql> use test;
Database changed
mysql> set query_statistics_history_size=5;
Query OK, 0 rows affected (0.00 sec)

mysql> set profiling=1;
Query OK, 0 rows affected (0.00 sec)

mysql> select * from t limit 5;
+——-+——–+
| t | u |
+——-+——–+
| 14515 | 282874 |
| 14521 | 258653 |
| 14573 | 113276 |
| 14577 | 826475 |
| 14585 | 444645 |
+——-+——–+
5 rows in set (0.00 sec)

mysql> show profiles;
+———-+————+————————-+
| Query_ID | Duration | Query |
+———-+————+————————-+
| 7 | 0.00052300 | select * from t limit 5 |
+———-+————+————————-+
1 row in set (0.00 sec)

mysql> set profiling=0;
Query OK, 0 rows affected (0.00 sec)

mysql> select * from t limit 10;
+——-+——–+
| t | u |
+——-+——–+
| 14515 | 282874 |
| 14521 | 258653 |
| 14573 | 113276 |
| 14577 | 826475 |
| 14585 | 444645 |
| 14612 | 792545 |
| 14626 | 483300 |
| 14842 | 447267 |
| 15325 | 38865 |
| 15340 | 744424 |
+——-+——–+
10 rows in set (0.00 sec)

mysql> show query table_statistics;
+———-+————–+————+———–+————–+————————-+
| Query_id | Table_schema | Table_name | Rows_read | Rows_changed | Rows_changed_x_#indexes |
+———-+————–+————+———–+————–+————————-+
| 2 | #temp# | #temp# | 15 | 15 | 15 |
| 7 | test | t | 5 | 0 | 0 |
| 10 | test | t | 10 | 0 | 0 |
+———-+————–+————+———–+————–+————————-+
3 rows in set (0.00 sec)

mysql> show profiles;
+———-+————+————————-+
| Query_ID | Duration | Query |
+———-+————+————————-+
| 7 | 0.00052300 | select * from t limit 5 |
+———-+————+————————-+
1 row in set (0.00 sec)

Notice how show query table_statistics has three queries. Query id 2 is actually the first show command. With profiling enabled or disabled show profiles doesn’t have as many queries as show table_statistics does.

Update on porting table_statistics to MariaDB

Over the past few weeks I’ve been working on porting the per table and per session row stats from the tivo patch into mariadb. This is the show table_statistics functionality. I’ve been pushing the code to a branch on lanchpad as I get features working. Up to revision 2953 the row stats are working but the index stats aren’t. I think I have a reasonable implementation of the row stats code given my level of C++ skill. I’m changing the syntax a bit from how it worked in the tivo patch to make it be more compatible with how the original row stats code worked that was ported into mariadb.

In MariaDB 5.2 there are show table_statistics show index_statistics, and show user_statistics functions that return global stats collected by the server when the userstats variable is enabled. There are also associated information schema tables that feed these commands. So far I’ve changed the syntax of the show command to be

SHOW [ QUERY | SESSION | GLOBAL ] TABLE_STATISTICS

Where the default show table_statistics returns the global stats. This is different from the tivo patch against 5.0.72 which changed the default to be the last query. The new syntax keeps things backwards compatible with the original show command. The query keyword will return the previous query_statistics_history_size queries of statistics. This is a bit different from the 5.0 version which only tracked one query.

For example if I set query_statistics_history_size=5 then mariadb will track the row stats from the previous 5 queries:

mysql> set query_statistics_history_size=5;
Query OK, 0 rows affected (0.00 sec)

mysql> show tables;
+—————-+
| Tables_in_test |
+—————-+
| t |
| t2 |
+—————-+
2 rows in set (0.00 sec)

mysql> select * from t;
+—+
| t |
+—+
| 1 |
| 2 |
| 3 |
| 4 |
| 5 |
+—+
5 rows in set (0.00 sec)

mysql> select * from t2;
+—-+
| t2 |
+—-+
| 1 |
| 2 |
| 3 |
+—-+
3 rows in set (0.00 sec)

mysql> show query table_statistics;
+———-+————–+————+———–+————–+————————-+
| Query_id | Table_schema | Table_name | Rows_read | Rows_changed | Rows_changed_x_#indexes |
+———-+————–+————+———–+————–+————————-+
| 81 | test | t | 5 | 0 | 0 |
| 82 | test | t2 | 3 | 0 | 0 |
+———-+————–+————+———–+————–+————————-+
2 rows in set (0.00 sec)

mysql> select count(*) from t,t2;
+———-+
| count(*) |
+———-+
| 15 |
+———-+
1 row in set (0.00 sec)

mysql> show query table_statistics;
+———-+————–+————+———–+————–+————————-+
| Query_id | Table_schema | Table_name | Rows_read | Rows_changed | Rows_changed_x_#indexes |
+———-+————–+————+———–+————–+————————-+
| 81 | test | t | 5 | 0 | 0 |
| 82 | test | t2 | 3 | 0 | 0 |
| 84 | test | t | 5 | 0 | 0 |
| 84 | test | t2 | 3 | 0 | 0 |
+———-+————–+————+———–+————–+————————-+
4 rows in set (0.00 sec)

Note that show query table_statistics shows the stats only for the current connection. There isn’t a way to get lists from other threads. My next task is to add a separate table with the query_id and query text similar to show profiles that can be used to correlate a query back to the different statistics kept for it. After that I plan on adding index statistics.

The older version of this in the tivo patch has been extremely useful for optimizing queries. I hope I can make this even better and get it into mariadb for other people to use. If you have any suggestions for how I should change this please post in the comments.