Archive for September 2011

Is group_concat_max_len in bytes or characters?

The manual says bytes but sometimes it is measured in characters. It seems like group_concat_max_len is in bytes when being passed through a temporary table and in characters otherwise. This works fine when using latin1 but when converting to utf8 mysql must reserve 3 bytes per character when setting types in a temporary table. This is yet another reason to dislike group_concat..

mysql> create table group_concat_bug (str1 varchar(255), str2 varchar(255), str3 varchar(255)) charset=utf8 engine=innodb;
Query OK, 0 rows affected (0.02 sec)

mysql> show create table group_concat_bug;
+——————+———————————————————————————————————————————————————————————–+
| Table | Create Table |
+——————+———————————————————————————————————————————————————————————–+
| group_concat_bug | CREATE TABLE `group_concat_bug` (
`str1` varchar(255) default NULL,
`str2` varchar(255) default NULL,
`str3` varchar(255) default NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8 |
+——————+———————————————————————————————————————————————————————————–+
1 row in set (0.00 sec)

mysql> insert into group_concat_bug set str1=’a';
Query OK, 1 row affected (0.01 sec)

mysql> insert into group_concat_bug set str1=’a';
Query OK, 1 row affected (0.00 sec)

mysql> insert into group_concat_bug set str1=’a';
Query OK, 1 row affected (0.00 sec)

mysql> insert into group_concat_bug set str1=’a';
Query OK, 1 row affected (0.00 sec)

mysql> insert into group_concat_bug set str1=’a';
Query OK, 1 row affected (0.00 sec)

mysql> insert into group_concat_bug set str1=’a';
Query OK, 1 row affected (0.00 sec)

mysql> select group_concat(str1) from group_concat_bug group by str2;
+——————–+
| group_concat(str1) |
+——————–+
| a,a,a,a,a,a |
+——————–+
1 row in set (0.00 sec)

mysql> set group_concat_max_len=5;
Query OK, 0 rows affected (0.00 sec)

mysql> explain select group_concat(str1) from group_concat_bug group by str2;
+—-+————-+——————+——+—————+——+———+——+——+—————-+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+—-+————-+——————+——+—————+——+———+——+——+—————-+
| 1 | SIMPLE | group_concat_bug | ALL | NULL | NULL | NULL | NULL | 6 | Using filesort |
+—-+————-+——————+——+—————+——+———+——+——+—————-+
1 row in set (0.00 sec)

mysql> select group_concat(str1) from group_concat_bug group by str2;
+——————–+
| group_concat(str1) |
+——————–+
| a,a,a |
+——————–+
1 row in set, 1 warning (0.00 sec)

mysql> explain select group_concat(str1) from group_concat_bug group by str2 order by str3;
+—-+————-+——————+——+—————+——+———+——+——+———————————+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+—-+————-+——————+——+—————+——+———+——+——+———————————+
| 1 | SIMPLE | group_concat_bug | ALL | NULL | NULL | NULL | NULL | 6 | Using temporary; Using filesort |
+—-+————-+——————+——+—————+——+———+——+——+———————————+
1 row in set (0.00 sec)

mysql> select group_concat(str1) from group_concat_bug group by str2 order by str3;
+——————–+
| group_concat(str1) |
+——————–+
| a |
+——————–+
1 row in set, 1 warning (0.00 sec)

mysql> show variables like ‘group_concat_max_len’;
+———————-+——-+
| Variable_name | Value |
+———————-+——-+
| group_concat_max_len | 5 |
+———————-+——-+
1 row in set (0.00 sec)

I tried to patch item_sum.cc dump_leaf_key() to include the max character length in bytes in the string max length calculation with:

- if (result->length() > item->max_length)
+ if ((result->length() * item->collation.collation->mbmaxlen) > item->max_length)

This didn’t work out as I planned. Repeat runs of queries return different results.

mysql> set group_concat_max_len=5; select group_concat(str1 separator ”) from group_concat_bug group by str2;
Query OK, 0 rows affected (0.00 sec)

+———————————+
| group_concat(str1 separator ”) |
+———————————+
| aa |
+———————————+
1 row in set, 1 warning (0.00 sec)

mysql> set group_concat_max_len=5; select group_concat(str1 separator ”) from group_concat_bug group by str2;
Query OK, 0 rows affected (0.00 sec)

+———————————+
| group_concat(str1 separator ”) |
+———————————+
| aau |
+———————————+
1 row in set, 1 warning (0.00 sec)

At this point I stopped investigating the problem because the application that was using group_concat was just parsing the result back into an array anyway. The developers agreed it was best to just get rid of the group_concat and issue two queries. This is what I recommend for fixing group_concat. The risk of having truncated values or running out of memory is too much to justify saving the cost of issuing a second query.