Don’t reference auto increment ids outside of mysql.

It’s a common practice to use MySQL’s auto increment feature for things like product ids or invoice numbers. While it always works when the system is first setup it can cause major headaches over time. The problems usually fall into one of three categories, dual master, test/staging setups, and reconciliation of ids.

MySQL in a dual master configuration requires that each master not be able to assign the other masters id. Replication events that use auto increment are tagged with the id that was allocated for that transaction. If you run mysqlbinlog on a bin log it will show something like SET INSERT_ID=123; This ensures that auto increment events are replicated to slaves correctly.

In a dual master configuration each master will try to allocate the same id, replicate the event to the other master where that id is already taken. This causes replication to stop on a duplicate key error. In MySQL 5 there are the auto_increment_increment and auto_increment_offset options that attempt to alleviate this situation by letting masters allocate unique ids. More details on how they work can be found in the manual. Using these options wastes key space so you will end up using larger data types for increment ids making MySQL less efficient. if you really need dual master (talk to me if you _really_ think you do, chances are you don’t) it works ok. A nasty problem rears it’s head when you start referencing these ids outside of MySQL. My favorite example is using auto_increment for invoice ids. Since two masters are allocating ids at offsets they aren’t contiguous like most accountants like. Instead of invoice numbers like 1001, 1002, 1003 you end up with 1011, 1021, 1022. depending on which master allocated which id when. Some might argue that this is a handy way to artificially inflate invoice ids but I think it’s a waste.

The testing/staging scenario usually causes reconciliation of ids. It’s more painful than dual master because fixing it often involves contacting partners with new ids, throwing away existing data or writing reconciliation scripts. What happens is that during staging data entry occurs. The ids generated from auto increment are associated with specific data entries and handed out. When the data is ready, it’s deployed to production. If production has also allocated new ids from previous data entry a collision occurs. It’s true that this can be avoided by good communication with product teams about how to use the ids but it’s better to just not reference auto increment ids outside of mysql in the first place. It also works the other way. Records from production need to be imported into a staging environment for testing purposes but staging has allocated those ids from previous tasks.

Reconciliation of ids, not counting the scenario above usually involves someone somewhere in the company that doesn’t like that the ids start with 1 instead of 0 or that when an item is deleted it’s id isn’t used again. Maybe ids should have started with 1000 instead of 100. Most of these can be fixed with simple updates, others require costly migrations away from auto_increment still others require dumping all the data, fixing the ids, and importing it again.

I’ve found it very difficult to convince people not to reference auto increment ids outside of their application. It seems so simple and clean cut on the surface. This entry was spawned from two different issues I helped people with today involving reference auto increment ids outside of MySQL and how to recover from the situation it caused. I’ve ran into this problem many many times in the past. Not referencing auto increment values outside of MySQL is a good best practices bullet point for any general MySQL presentation.

[Updated: 2007-03-21 I just talked to another person that is setting a default value of 1000 for auto increment so they can use it for ID card numbers.]

6 Comments

  1. HR says:

    Maybe I should read the article again but how do you generate invoices ids over a dual master setup then?

  2. Hi Eric, interesting entry :)

    I must say, I believe the waste of key space won’t be that big of a problem for many applications.

    Regarding the usage outside of MySQL: to me, auto_increment (but also sequences like in Oracle) is an implementation of a artificial key. As such it should indeed only be used inside the application – it’s there only as a technical means to implement identification and references.

    Having said that, I think that it need not always be a problem to use the id’s outside the database. For example, a website for a product catalog could very well use the id value (generated from an auto_increment) to refer to a particular product.

    There is one technical problem with this: when you delete rows from the end of your table, and then restart the server, auto_increment will not pick up where it left off last time. Instead, it will pick up at max(id) + 1

    This means that if you recorded the id somewhere outside the db, it is now invalid. It now points to another item instead of pointing to a deleted item. So, if you really need to rely on cross database identification you need to think of somethings else than auto_increment (UUID(), or your own sequence implementation)

    kind regards,

    Roland Bouman

  3. Eric Bergen says:

    Having product ids outside the database that are supposed to be valid but are now invalid is a problem. It’s yet another reason not to reference auto increment ids outside of MySQL.

    MyISAM tables store the auto increment id in the MYI file. Even if you delete rows and restart the server MySQL will not reuse the id. The same is true for innodb.

  4. Susan says:

    I too faced the same problem once on a project… But I remember I thensimply followed the database structure of oscommerce…

  5. Paul says:

    > MyISAM tables store the auto increment id in the MYI file. Even if you delete rows and restart the server MySQL will not reuse the id. The same is true for innodb.

    I know I’m over a year late on this discussion, but I was bitten recently by the fact that this *isn’t* true for InnoDB. Since this is one article I came across during an “autoincrement best practices” search, I thought it’d be worthwhile to comment.

    From the docs:

    “InnoDB uses the in-memory auto-increment counter as long as the server runs. When the server is stopped and restarted, InnoDB reinitializes the counter for each table for the first INSERT to the table, as described earlier. ”

    Good article, by the way. I’ve seen situations where autoincrement ids are used extensively outside of the database for millions of items. The customer ran into a huge problem when they wanted to rebuild their database from scratch & there was no guarantee that their items would be loaded in exactly the same order.

  6. Hampus says:

    I realise that this was posted quite a while ago, but I find the subject interesting. I do see the problems that referencing auto generated ids outside the db can cause. If you need some kind of product or invoice numbers, then I do not, however, see how that problem could be solved by not using the auto-increment feature of MySQL. Isn’t this a potential problem no matter how you generate the numbers? (even more so if you do it manually!)

    Would it be better if I generated similar numbers myself and inserted in the db. That can create problems too and would at the very least need to use transactions to be safe. With autoincrement you get the same, but with less effort. Or?

Leave a Reply