Fun Python question

What does this do?
def func():
  print 'foo'
  yield 'bar'

print func()
Take a guess, then give it a try.


Google MySQL team is hiring

Do you know MySQL inside and out? Do you have experience running it production? Have you dared to look at the code? Maybe even patched it? Google's MySQL team would love to speak to you. We're hiring several software engineers in Mountain View, CA, USA to primarily work on MySQL server improvements, contributing to Google's open source MySQL branch. In addition to feature work, Google MySQL developers work closely with operations-focused engineers tracking down bugs and solving tricky performance problems, at truly enormous scale.

Sound interesting? Email flamingcow@google.com.


The dangers of max_connections

We have a database that's drastically scaling up its query rate from many different clients. This typically means more client jobs and more parallelization across connections, so the first limit we hit is max_user_connections (in the mysql.user table), which we use to provide isolation between runaway users/jobs. We increase that, and we eventually start hitting server-level max_connection limits. We increased that in an emergency push last night, and several hours later performance fell off a cliff.

SHOW PROCESSLIST showed tons of processes in "Opening tables" and "closing tables" state. Our InnoDB IO stats had dropped dramatically; we clearly weren't making significant progress actually answering queries. A quick search turned up a Percona post about this, which pointed to table_cache. A quick look at SHOW VARIABLES LIKE 'table_cache' showed 91. Our config has this set to 2048; something had clearly gone wrong.

One hint was the error log message:
111209 11:08:21 [Warning] Changed limits: max_open_files: 8192  max_connections: 8000  table_cache: 91
It turns out that MySQL tries to be smart about table_cache, based on the open file rlimit. The formula is (max_open_files - max_connections - 10) / 2, which in this case is (8192 - 8000 - 10) = 182 / 2 = 91. 91 active tables is smaller than the hot set in this database, and when query rate crossed a line, the global open/close table lock and underlying syscalls couldn't keep up, and performance death-spiraled from there. We run-time set the limit higher with:
SET GLOBAL table_cache = 2048;
This overrides the automatic setting but opens the possibility of hitting the rlimit when opening tables. The real fix is to increase the rlimit to accommodate both pools of file descriptors.


Converting subselects to joins, part 2

I previously discussed this in depth. However, today I saw a case that I didn't cover:

You have a table of Users and a table of Logins, with a row for each user login event. You're looking for users that have logged in within the last 31 days. The initial version of this I saw used a derived table:
  FROM Users
    JOIN (
          DATEDIFF(NOW(), MAX(TimeStamp)) AS LastLogin
        FROM Logins
        GROUP BY UserId
    ) AS Temp USING (UserId)
  WHERE LastLogin <= 31;
We can convert this to a simple JOIN with the magic of HAVING. HAVING is like WHERE, but applies after aggregation:
    DATEDIFF(NOW(), MAX(TimeStamp)) AS LastLogin
  FROM Users
    JOIN Logins USING (UserId)
  HAVING LastLogin <= 31;

Safe(r) data changes

It's a great goal to avoid making manual changes to your database. It never works 100%, though; there are always software bugs, unexpected interactions and other events that muck up your data, and you have to do one-off corrections. These are inherently hazardous: hard to test for unexpected data corruption, hard to apply consistently and hard to model the application behavior that results from them. Here are some strategies for the first issue, avoiding unexpected data corruption.
  1. Don't run one-off executables against your database. Instead, have the executable print out SQL that it would have run to update the database. If something goes wrong, you don't have to model the behavior of the program; you can just look at the SQL.
  2. Check the SQL files into source control somewhere. Manual changes tend to breed more manual changes to fix the fixes, so you never know when you'll want a record of what you twiddled in the past.
  3. Include all fields from the primary key in the WHERE clause. This ensures that each statement only modifies one row. Even if this results in a huge list of changes, at least you know exactly what changed.
  4. Include as many additional gating clauses as possible, linked with AND. For example, if you have a table of products and you want to set the price to 0.99 for everything that is currently set to 1.00, do:
    UPDATE Products SET Price=0.99 WHERE ProductId=2762 AND Price=1.00;
    This ensures that if something else changes Price just before you run your change, you don't destroy that update.
  5. Record the number of rows affected by each statement, in case something unexpected happens.
  6. Use transactions sensibly. Overly huge grouping of statements can block replication, but consider whether your changes will be toxic if partially applied.
  7. Stop running changes on errors or warnings and let a human examine the output. Warnings like string truncation can be a sign of a broken change.
#4 can be a challenge if your verification statements are complex. Consider, for example, if you want to update rows in table A for which there is exactly one row for a particular CustomerId. It's relatively easy to do a SELECT statement to verify this by hand:
    CustomerId IN (15, 16)
  GROUP BY CustomerId;
To verify this at UPDATE time, however, you either need a subselect or an intermediate table. We'll use the latter:
      COUNT(CustomerId) AS Customers
    FROM A
      CustomerId IN (15, 16)
    GROUP BY CustomerId;

  JOIN ScratchTable USING (CustomerId)
  SET Updated=1
    AND Customers=1;
The same trick works if your data change inserts new rows:
INSERT INTO A (CustomerId)
  SELECT CustomerId
    FROM ScratchTable
      AND Customers=1;


InnoDB as the default table type

We recently switched from MyISAM to InnoDB as the default table type. This affects CREATE TABLE without an explicit ENGINE=, as well as implicitly-created tables for sorting and such. Mark had previously discussed issues with this here, but we thought it was worth giving another try.

We've found a far more basic problem: KILL takes a long time. Consider the following:
CREATE TABLE scratch.MyShortLivedTable

There's no ENGINE= there, so it uses the default table type. It seems to run at about the same speed as it did with MyISAM, but then you kill it. If it's MyISAM, the KILL takes effect immediately. If it's InnoDB, the CREATE is non-transactional and stays, but the implicit INSERTs have to be rolled back. If that query took 10 minutes, it may take 10 minutes to roll back, during which time the query is unkillable and MySQL won't shut down cleanly.

Our failovers rely on being able to kill all connections to a slave that they're promoting to master. Our backups rely on being able to shut down MySQL. We don't have a good answer for this, short of hardwiring ENGINE=MyISAM onto writes we know are going to be large.


Database best practices for future scalability

There’s a perpetual debate about how much effort to put into scalability when first designing and building a modern web application. The opposing arguments are roughly: “Scalability doesn’t matter if your app isn’t successful; features should come first” vs. “When you know that you need to scale, you won’t have time to do it.” Below are some suggestions for scalable MySQL schema design that should get you on the right path, without being onerous enough to slow you down. Let’s start with things that are almost free:

  1. Use InnoDB. You’re eventually likely to want transactions, foreign keys, row-level locking or relative crash safety. You can Google and find lots of InnoDB vs. MyISAM comparisons, but if you don’t want to dig too deeply, “just use InnoDB” is a safe place to start.

  2. Don’t store data that you don’t intend to use relationally. Using MySQL as a key/value store or to store encoded data (XML, protocol buffers, etc.) in BLOB/TEXT may work, but dedicated key/value stores are likely to be more efficient.

  3. Try to design your schema with as few hierarchies as possible. For example, take the following table layout, with arrows representing many-to-one relationships:Perhaps A=”Users”, B=”DVDs Owned”, C=”Logins”, D=”Times Watched”, while E=”Administrators” and F=”Changes”. These are two hierarchies, since A and E have no links to each other. Minimizing the number of hierarchies (keeping it to just one is awesome!) makes it easier to shard later. Schemas with cross-links (say F also links to A, or a table records transfers between two different users, linking to A twice) are very difficult to shard.

  4. Use BIGINT UNSIGNED NOT NULL (64-bit required numeric) primary keys on every table. AUTO_INCREMENT is fine, at least to start. You can skip this for many-to-many link tables; just put both link columns in the primary key. Having single-column, numeric primary keys makes it easier to do things like drift checking and traversing between tables.

  5. Use BIGINT instead of INT for all keys. The cost in space (4 vs. 8 bytes) and compute time is so small that you’ll never notice it, but the cost of a schema change later to increase the field size, or an accidental wraparound, can be enormous. As you expand shards, your key space becomes sparse and grows rapidly, so wide keys are critical.

  6. Split off your data access code into an ORM layer, even if it’s very simple to begin with. This will be where your read/write split, shard lookups and change tracking will live later.

  7. Don't use triggers or stored routines. Keep this logic in your ORM layer instead, to give yourself a single point of truth and versioning system.

  8. Avoid subselects; use joins instead.

  9. Don’t use views, unless you’re using a third-party ORM (rails, django) that mandates a schema structure that isn’t ideal.

  10. Avoid network round-trips whenever possible. Use the multi-row insert syntax where possible. Enable the CLIENT_MULTI_STATEMENTS flag at connect time, then send groups of statements separated by ";".

Then, things that cost development time, in increasing order of complexity:
  1. Use foreign keys. Don’t make references nullable; use a flag field to mark whole (sub-)hierarchies deleted instead. Combined with the hierarchy rule above, this means that you guarantee yourself that you’ll never end up with orphaned rows.

  2. Write to masters; read from slaves. This can be quite complex, since you have to worry about replication delay. For example, you can’t have one web page hit cause a write, then the next hit render the results by reading from the database, because the result might not have replicated. However, this enables significant scaling, because hooking up more slaves is much easier than sharding.

  3. Don’t store event-based data as one row per event. If you record page views or clicks in the database, aggregate that data into one row per hour, or per day. You can keep logs of events outside of the database in case you need to change aggregation and re-generate historical data, but don’t keep every event in a hot table.

  4. Stop using AUTO_INCREMENT. Instead, keep a table IdSequences, and do something like:
    UPDATE IdSequences SET LastId=LAST_INSERT_ID(LastId+Increment)
      WHERE TableName='A' AND ColumnName='b';
    INSERT INTO A (b, c) VALUES (LAST_INSERT_ID(), ‘foo’);
    This lets you change IdSequences later to modify your sharding scheme.

  5. Create an empty shard (new database, same schema, no data) and add test rows. Teach your application to choose which shard to talk to. This will require some method to look up a shard for the root of each hierarchy; keep all data linked to a particular root on the same shard, so you can JOIN it. At its simplest, the lookup can be (ID mod NumShards). If you have uneven shard growth, you may need an indirection table to map from virtual shard (determined by modular division with a large divisor) to physical database.