Second update of modifying table statistics in MariaDB

Since my last post I’ve changed how the table statistics work quite a bit in MariaDB. I ran into a few problems with my original changes. In the TiVo 5.0 patch the show table_statistics command chose from one of three hash tables to read from depending on the flags. There is a global hash table for global stats and two in the thd object for session and query stats. Each time a non show query is executed the query statistics are reset. In 5.1 the implementation of show command changed from reading arbitrary data structures to constructing queries to run against information_schema tables. The information_schema tables are constructed on the fly, placed into a temporary table and have the select resulting from the show command executed on them. This works ok for the show commands but broke my new information schema tables.

As part of porting my changes into mariadb 5.2 I added a few new information schema tables called QUERY_TABLE_STATISTICS, QUERY_INDEX_STATISTICS, SESSION_TABLE_STATISTICS, and SESSION_INDEX_STATISTICS. For the part these tables query the hash tables from the 5.0 implementation. This worked fine for the show commands in my previous update. It broke when querying the QUERY_TABLE_STATISTICS directly. In 5.0 I reset the per query table statistics on every query that isn’t a SHOW query. This works fine except that SELECT * FROM QUERY_TABLE_STATISTICS reset the query statistics that the select query was supposed to retrieve.

After a conversation with serg in #mariadb on freenode we came up with a plan to fix the QUERY_TABLE_STATISTICS and QUERY_INDEX_STATISTICS tables. Instead of tracking the most recent query the plan is to track the most recent N queries tracked with a QUERY_ID. I realized this is very similar to how SHOW PROFILES worked. I decided to try to integrate the query statistics in with the profile. This seemed like a good idea until I realized that I want to enable the slow query log statistics without the overhead of leaving profiling on in production.

All of the profiling methods are controlled by the profiling variable. All of the table, index, and user statistics are controlled by the userstat variable. The profiling and my changes to the table statistics both use a QUERY_ID to show a list of queries. The problem is that depending on when the profiling and userstat variables are enabled the query_ids can be inconsistent. For example if a user enables userstat, executes a query, then enables profiling and executes another query the table statistics query_id 1 and the profiling query_id 1 will be different.

Every show command that uses a information schema table puts rows into a temporary table. I’ve also added temporary table tracking into the table statistics. This has an interesting side effect when combined with per query statistics tracking. Queries are only put into the queue of queries for query statistics when they read/write rows. The query_statistics_history_size controls how many queries to keep stats for. Each show command now uses a temporary table so the show commands are now tracked using row statistics. Here is an example using show variables.

mysql> set query_statistics_history_size=5;
Query OK, 0 rows affected (0.00 sec)

mysql> show variables;
+——————————————-+—————————————————————————————————————-+
| Variable_name | Value |
+——————————————-+—————————————————————————————————————-+
| aria_block_size | 8192
………
| warning_count | 0 |
+——————————————-+—————————————————————————————————————-+
395 rows in set (0.00 sec)

mysql> show query table_statistics;
+———-+————–+————+———–+————–+————————-+
| Query_id | Table_schema | Table_name | Rows_read | Rows_changed | Rows_changed_x_#indexes |
+———-+————–+————+———–+————–+————————-+
| 8 | #temp# | #temp# | 395 | 395 | 395 |
+———-+————–+————+———–+————–+————————-+
1 row in set (0.00 sec)

mysql> show query table_statistics;
+———-+————–+————+———–+————–+————————-+
| Query_id | Table_schema | Table_name | Rows_read | Rows_changed | Rows_changed_x_#indexes |
+———-+————–+————+———–+————–+————————-+
| 8 | #temp# | #temp# | 395 | 395 | 395 |
| 9 | #temp# | #temp# | 1 | 1 | 1 |
+———-+————–+————+———–+————–+————————-+
2 rows in set (0.00 sec)

mysql> show query table_statistics;
+———-+————–+————+———–+————–+————————-+
| Query_id | Table_schema | Table_name | Rows_read | Rows_changed | Rows_changed_x_#indexes |
+———-+————–+————+———–+————–+————————-+
| 8 | #temp# | #temp# | 395 | 395 | 395 |
| 9 | #temp# | #temp# | 1 | 1 | 1 |
| 10 | #temp# | #temp# | 2 | 2 | 2 |
+———-+————–+————+———–+————–+————————-+
3 rows in set (0.00 sec)

See how each new show command adds in another query into the queue of tracked queries? While technically correct I don’t think this is ideal. I’m open to suggestions on how to keep the temp table tracking useful while not polluting the list of queries with unnecessary results.

I went back to #mariadb again and we came up with a single unified query_id per thread that both show profiles and the table statistics can use. This unifies the query ids with the down side that a lot of ids can be wasted. Doing set profiling=1, executing a query, and running show profiles isn’t guaranteed to use query_id 1 like it did before. This is where I’m looking to the community to decide how to handle the two different commands and query ids.

The main reason for tracking the per query statistics is to dump them to the slow query log. If userstat is enabled and a query is written to the slow query log it will include two new comments that look like:

# Time: 110712 13:12:16
# User@Host: [ebergen] @ localhost []
# Thread_id: 1 Schema: test QC_hit: No
# Query_time: 64.666276 Lock_time: 0.000113 Rows_sent: 10 Rows_examined: 719488
# Row_Stats: test:rows_read=677984,rows_changed=0,rows_changed_x_indexes=0;#temp#:rows_read=41504,rows_changed=41494,rows_changed_x_indexes=41494;
# Index_Stats:
SET timestamp=1310501536;
select * from t2 group by u order by u desc limit 10;

Things that are missing or wrong so far. The first thing is that I don’t have a query similar to show profiles for queries with their ids. I don’t want to duplicate show profiles for statistics. I’m open to suggestion on how to unify the profiling and table statistics. Some of the structures for profiling and query stats are similar. I think they can be unified but this is more work than I want to put into the tivo branch. If mariadb is willing to accept this kind of feature I can work on unifying them.

Here is an example of where I’m stuck with show profiles vs show query statistics.

mysql> use test;
Database changed
mysql> set query_statistics_history_size=5;
Query OK, 0 rows affected (0.00 sec)

mysql> set profiling=1;
Query OK, 0 rows affected (0.00 sec)

mysql> select * from t limit 5;
+——-+——–+
| t | u |
+——-+——–+
| 14515 | 282874 |
| 14521 | 258653 |
| 14573 | 113276 |
| 14577 | 826475 |
| 14585 | 444645 |
+——-+——–+
5 rows in set (0.00 sec)

mysql> show profiles;
+———-+————+————————-+
| Query_ID | Duration | Query |
+———-+————+————————-+
| 7 | 0.00052300 | select * from t limit 5 |
+———-+————+————————-+
1 row in set (0.00 sec)

mysql> set profiling=0;
Query OK, 0 rows affected (0.00 sec)

mysql> select * from t limit 10;
+——-+——–+
| t | u |
+——-+——–+
| 14515 | 282874 |
| 14521 | 258653 |
| 14573 | 113276 |
| 14577 | 826475 |
| 14585 | 444645 |
| 14612 | 792545 |
| 14626 | 483300 |
| 14842 | 447267 |
| 15325 | 38865 |
| 15340 | 744424 |
+——-+——–+
10 rows in set (0.00 sec)

mysql> show query table_statistics;
+———-+————–+————+———–+————–+————————-+
| Query_id | Table_schema | Table_name | Rows_read | Rows_changed | Rows_changed_x_#indexes |
+———-+————–+————+———–+————–+————————-+
| 2 | #temp# | #temp# | 15 | 15 | 15 |
| 7 | test | t | 5 | 0 | 0 |
| 10 | test | t | 10 | 0 | 0 |
+———-+————–+————+———–+————–+————————-+
3 rows in set (0.00 sec)

mysql> show profiles;
+———-+————+————————-+
| Query_ID | Duration | Query |
+———-+————+————————-+
| 7 | 0.00052300 | select * from t limit 5 |
+———-+————+————————-+
1 row in set (0.00 sec)

Notice how show query table_statistics has three queries. Query id 2 is actually the first show command. With profiling enabled or disabled show profiles doesn’t have as many queries as show table_statistics does.

Leave a Reply