wiki:PrimaryKeyTransplant

Primary Key Transplant in PostgreSQL

This is an experimental recipe for changing a postgres table's primary key without having to create a new index right at that point. We used a simplified version of this procedure on Launchpad's BranchRevision table, but kids, please don't try this at home!

If this weren't experimental and therefore highly dangerous, you would use it when:

  • you have a table with a primary key that you want to get rid of, and
  • you want to choose a different primary key for it, but
  • you can't afford to take applications offline or have them block while the new primary-key constraint creates a new index.

In some RDBMS a trick like this would be unthinkable. In PostgreSQL however a primary key is just a constraint like any other. At its core it's just a unique constraint with a marker to say that you also like to think of it as "primary." Some add-on software such as Slony-I may pay attention to that marker as the default choice of key, but that's about all it is.

In Launchpad, the operation was simpler because we already had a unique constraint (not just a unique index!) on the new primary key. The two kinds of constraints are much alike, and so there's less work patching up dependencies at the end.

Why?

We developed this trick to deal with a large linking table in Launchpad's postgres database. A linking table is the standard way of representing an m:n relationship in a relational database:

a b
1 9
1 113
3 6
12 113

Here, a and b are foreign keys referring to relations A and B respectively. So this says that the A with key 1 is associated with B numbers 9 and 113; A number 3 with B number 6; and so on. In the case of our table, A and B are both non-null. We have a unique constraint on (a, b); the combination is our primary key.

Unfortunately our old object-relational mapper forced us to add a surrogate key so that it could conveniently treat these linking records as objects:

id A B
1 1 9
2 1 113
3 3 6
4 12 113

The id existed solely for the convenience of the ORM. It was made the table's primary key, even though our application had no need for it whatsoever.

Then we upgraded to another ORM called Storm, which doesn't need the id column or its index. Great! We'd like to drop both and save gigabytes of disk space.

There's one fly in the ointment: our replication manager, Slony-I, makes use of tables' primary keys by default. And our management infrastructure is not prepared to pass it another choice than the default. Therefore we'd like to make (a, b) the primary key.

But creating a new primary-key constraint will create a new unique index on (a, b)—even though we already have one. And exactly because the table is so large, we can't afford to take the table out of action for that long. So we looked for a cheap way to transfer the primary key constraint from id to (a, b). What we came up with was a bit of a hack.

What you need

Your new primary key already has to be a key. That is, its constituent columns must be NOT NULL and there must already be a unique index matching the key.

If you don't have an index on your new primary key yet, you'll be able to create one using CREATE INDEX CONCURRENTLY so that you don't need to stop your application in the meantime. This is a big step up from having the index created synchronously as you new a new primary-key constraint.

Once you have the index, you'll probably have to take your application offline for the transplant. But it's going to be a lot quicker than re-indexing a large table.

There is an unrelated postgres patch underway to support creation of primary-key constraints using an existing index. Once that option is implemented, probably around 9.1, use it instead! Don't muck about with nasty tricks like the one I outline here if you have the choice.

The transplant

In the "primary key transplant" we take an existing primary-key constraint and make it refer to different columns and a different index than before.

Some points to check before you begin the transplant:

  • All columns in the new key are NOT NULL.
  • You have a unique index on your new key.
  • The index is complete and valid. (This is a concern with concurrent index creation. Use \d on the table to check).
  • There are no other sessions open on the database. (We're not sure this is needed, but it's not certain that other backends will be notified to invalidate their internal caches.)

There are some pieces of information you'll need to figure out as well. I'll refer to the variables in Python style:

  • pg_constraint is the oid of the pg_constraint class: SELECT oid FROM pg_class WHERE relname = 'pg_constraint'
  • pg_class is the oid of the pg_class class: SELECT oid FROM pg_class WHERE relname = 'pg_class'
  • relid is the id of your table: SELECT oid FROM pg_class WHERE relname = 'my_table'
  • old_indexid is the id of the index supporting your existing primary-key constraint: SELECT oid FROM pg_class WHERE relname = 'my_table_pkey'
  • new_indexid is the id of the index for your new primary key: SELECT oid FROM pg_class WHERE relname = 'idx_for_new_key'
  • constraintid is the id of your primary-key constraint: SELECT oid FROM pg_constraint WHERE conrelid = %(relid)d AND contype = 'p'
  • new_key_columns is a postgres array of the column numbers that make up your new primary key. The column numbers are 1-based, so the first column would be '{1}' and a compound key consisting of the two columns after that would be '{2,3}'.

Verify that you have the right new_indexid:

SELECT indisprimary, indisvalid, indisready, indkey
FROM pg_index
WHERE
    indrelid = %(relid)d AND
    indexrelid = %(new_indexid)d;

You should get exactly one row, with values false, true, true, and the list of column numbers. The list of column numbers should match the ones in new_key_columns (though in a different syntax).

The operation

Here we go! After each statement is a comment giving the expected output. If it doesn't match the answer you get from the database, back out immediately.

BEGIN;
-- BEGIN

- Old key's index no longer guards a primary key.
UPDATE pg_index SET indisprimary = FALSE WHERE indrelid = %(relid)d AND indisprimary;
-- UPDATE 1

-- New key's index now guards a primary key.
UPDATE pg_index SET indisprimary = TRUE WHERE indexrelid = %(new_indexid)d;
-- UPDATE 1

-- The primary-key constraint is on different columns now.
UPDATE pg_constraint SET conkey = '%(new_key_columns)s' WHERE oid = %(constraintid)d;
-- UPDATE 1

-- The constraint now depends on the new key's index, not the old one's.
UPDATE pg_depend
SET objid = %(new_indexid)d
WHERE
    refobjid = %(constraintid)d AND
    objid = %(old_indexid)d;
-- UPDATE 1

-- In PostgreSQL 9.x, but not earlier versions (I haven't tested this):
UPDATE pg_constraint SET conindid = %(new_indexid)d WHERE oid = %(constraintid)d; 
-- UPDATE 1

-- Delete old dependency between the constraint and the table columns.
DELETE FROM pg_depend
WHERE
    classid = %(pg_constraint)d AND
    objid = %(constraintid)d AND
    refclassid = %(pg_class)d AND
    refobjid = %(relid)d AND
    deptype = 'a';
-- DELETE <len(old_key_columns)>

-- Transfer dependencies of the new key's index on its table
-- columns to the constraint.  It depends on the columns now.
UPDATE pg_depend
SET
    classid = %(pg_constraint)d,
    objid = %(constraintid)d
WHERE
    classid = %(pg_class)d AND
    objid = %(new_indexid)d AND
    refclassid = %(pg_class)d AND
    refobjid = %(relid)d AND
    deptype = 'a';
-- UPDATE <len(new_key_columns)>

ABORT; -- Unless you're really sure (and also slightly insane)

At this point, close your session (it's still the only session open on the database, right?) and start a new one to check up. In psql, try doing \d my_table. You should see the change in primary key, though the index on the old one is still there. You're free to drop the old primary-key index now if appropriate, and even the old primary key itself.

Evaluation

To verify the transplant, I created two entirely identical test databases with just one table. In one, the "transplant database," I created a new index and performed a primary-key transplant. In the other, I dropped the table's primary-key constraint and added a new one.

Data/schema dumps created with pg_dump were identical, apart from naming.

The system catalogs were also compared, by dumping each system table to file using COPY pg_… WITH OIDS TO 'pg_….sql' and comparing those outputs for the two databases. Tables that did not have oids were dumped without the WITH OIDS option. All changes in the system catalogs were accounted for.

People who have helped to get this to work:

  • Stuart Bishop, who had the idea.
  • RhodiumToad, whose expertise and patience made it all possible.
  • BlueAidan, who was first to respond to my IRC questions.
Last modified 8 years ago Last modified on Aug 8, 2011, 4:56:19 AM