How to get from MySQL SQL to C

Occasionally it is useful to know what a MySQL command is doing internally. Just looking into the MySQL source directory can be overwhelming. Knowing the basics of the handler interface and the sql parser can be a great start for reading the source code to understand what MySQL does under the hood. Here I will cover a little bit about how the SQL syntax is defined.

Everything starts with lex.h and sql_yacc.yy in the sql/ dir. lex.h contains all the functions and symbols used to make up the SQL syntax. The sql_yacc.yy file describes the relationships between these symbols and the C functions responsible for executing them. I’m not sure why some symbol definitions end in _SYM and others don’t. Looking in lex.h “FLUSH” is defined as FLUSH_SYM. To see all the places where flush is allowed in the SQL go back to sql_yacc.yy and grep for it.

The first important section looks like this:

/* flush things */                                                                      

flush:                                                                                  
          FLUSH_SYM opt_no_write_to_binlog                                              
          {                                                                             
            LEX *lex=Lex;                                                               
            lex->sql_command= SQLCOM_FLUSH;                                             
            lex->type= 0;                                                               
            lex->no_write_to_binlog= $2;                                                
          }                                                                             
          flush_options                                                                 
          {}                                                                            
        ;

This is where things can get a bit nested and weird. The flush: section is saying that flush can have opt_no_write_to_binlog optionally after it. The first section in curly braces defines the sql command and also sets the flag no_write_to_binlog. SQLCOM_FLUSH is important in the next phase where we get into actual C code.

flush_options used to define all of the possible options for a flush command. Going one step further down flush_options_list basically says that a flush command can contain more than one option.

flush_options_list:
          flush_options_list ',' flush_option
        | flush_option
          {}
        ;

Notice that flush_options_list can contain a flush_options_list. I don’t know the specifics of this but it is the yacc way of saying things can be repeated. In this case the | is saying that there can be multiple flush_option separated by a comma or just one option.

With flush_option: things start to make more sense. This is all of the different types of flush commands. Looking at the first part

flush_option:
          ERROR_SYM LOGS_SYM
          { Lex->type|= REFRESH_ERROR_LOG; }
        | ENGINE_SYM LOGS_SYM
          { Lex->type|= REFRESH_ENGINE_LOG; }
        | GENERAL LOGS_SYM
          { Lex->type|= REFRESH_GENERAL_LOG; }
        | SLOW LOGS_SYM

Reading it in english this is basically saying  ”Error logs OR engine logs OR general logs OR slow logs” Combining this with the previous section allowing multiple flush options this is a valid query:

MariaDB [test]> flush error logs, slow logs;
Query OK, 0 rows affected (0.00 sec)

The flush command is quite a bit improved in MariaDB 10. Comparing this to part of the flush_option: section from MariaDB 5.2 shows how much it has improved:

flush_option:
          table_or_tables
          { Lex->type|= REFRESH_TABLES; }
          opt_table_list {}
        | TABLES WITH READ_SYM LOCK_SYM
          { Lex->type|= REFRESH_TABLES | REFRESH_READ_LOCK; }
        | QUERY_SYM CACHE_SYM
          { Lex->type|= REFRESH_QUERY_CACHE_FREE; }
        | HOSTS_SYM
          { Lex->type|= REFRESH_HOSTS; }
        | PRIVILEGES
          { Lex->type|= REFRESH_GRANT; }
        | LOGS_SYM
          { Lex->type|= REFRESH_LOG; }
        | STATUS_SYM
          { Lex->type|= REFRESH_STATUS; }

In 5.2 the LOGS_SYM is alone which makes flush error logs an invalid query. By scanning the grammar in sql_yacc.yy it is easy to see which syntax is and isn’t supported between versions.

Now that a command of SQLCOM_FLUSH has been specified. The flush_option is passed in via Lex->type. Each option is bitwise ORed into type. It is time to switch over to C++ code and see how these are executed.

sql_parse.cc has a huge switch case statement that contains every possible command type that MySQL can process. For this example look for case SQLCOM_FLUSH: The SQLCOM_FLUSH is the same option from the grammar file. There is basically one important function under this section reload_acl_and_cache().

In newer versions reload_acl_and_cache() is in sql_reload.cc. In older versions it is in sql_parse.cc This function basically checks each type of thing that can be flushed one by one to see if their flag has been ORed into Lex-type which is options here. It then calls the C++ code responsible for carrying out the flushing of that type of object.

Other types of calls can be traced the same way. Most of the token names can be easily grepped from sql_yacc.yy. Most of the show commands are handled similar to flush except when it gets into C++ code they are mapped into a special pair of arrays that tell MySQL how to generate a INFORMATION_SCHEMA table.

For the show commands the C++ code is in the sql_yacc.yy file as a call to prepare_schema_table(). There are two arrays that are important when adding a new table. In MariaDB 10 the first is enum_schema_tables in handler.cc. In older versions this array is in table.h. The other array is ST_SCHEMA_TABLE schema_tables in sql_show.cc.

schema_tables has the important information. Among other things it holds the text name of the table, the fields used to make it, and the function called to generate the data in the table. The fields list usually ends with _fields_info and the function used to create the result set used for the table starts with fill_.

One Comment

  1. Baron says:

    Very helpful, thanks.

Leave a Reply