tail -f findings.out

« Creating better MySQL indexes: The basics

Running MySQL queries in Python »

Finding the smallest or largest value across multiple columns in MySQL

A long title for a simple problem: You have a table containing several columns of, say, dates. You want to find out the first or last date among the values of these columns per row. Not too bad. But, you can’t just use greatest() or least(). Well, you could before MySQL 5.0.13. After that, the presence of any NULL values in the columns compared would result in NULL being returned. This is the proper implementation of the SQL standard, but it also means everyone depending on the old way was out of luck.

And so too are we who still need to find the extreme value among columns that can contain NULL! The problem might have been solved by MySQL releasing two new functions at the same time as the greatest()/least() changes that would perform the same operation as their old behavior, ignoring NULL values. It’s a common enough use case. But complaining won’t write queries.

The first attempt I made used a combination of least() with coalesce(). This function returns the first non-NULL value among those passed. The problem with this approach is that any of the column values per row can be NULL along with any of the others. So I would have needed to cover all possible permutations, namely the number of columns to compare across squared! This gets immediately tedious. Consider the case for just four columns:

1
2
3
4
5
6
7
8
9
10
SELECT id, date1, date2, date3, date4,
least(
coalesce(date1, date2, date3, date4),
coalesce(date2, date1, date3, date4),
coalesce(date3, date2, date4, date1),
coalesce(date4, date1, date2, date3),
# 12 more permutations...
) AS first_date
FROM my_db.my_table
GROUP BY id;

Yuck. I nearly wrote a script that would take the columns to be compared and generate the needed query.

Luckily, fellow information wrangler Bill Karpovich suggested a much more elegant (and blissfully shorter) version:

1
2
3
4
5
6
7
8
9
SELECT id, date1, date2, date3, date4,
least(
ifnull(date1, '9999-99-99'),
ifnull(date2, '9999-99-99'),
ifnull(date3, '9999-99-99'),
ifnull(date4, '9999-99-99')
) AS first_date
FROM my_db.my_table
GROUP BY id;

This checks each column to be examined, and considers NULL values to be equal to ‘9999-99-99′, which I feel safe in assuming is greater than any valid date I will have. This method fulfills this common need nicely, and in syntax only requires a number of statements linearly increasing with the number of columns.

Share and Enjoy:
  • email
  • LinkedIn
  • Slashdot
  • StumbleUpon
  • Technorati
  • Netvibes

Post to Twitter Post to Delicious Post to Digg Post to Reddit

Possibly Related (no promises):

  1. Creating better MySQL indexes: The basics
  2. Advanced ordering of MySQL results
  3. Running MySQL queries in Python
  4. Rapidly set up a MySQL database for testing
  5. Better access to MySQL create view statements

Related posts brought to you by Yet Another Related Posts Plugin.

Tags: , , ,
March 20, 2009 - 10:50 PM
1 comment »
Leave a reply

Subscribe without commenting

Twitter links powered by Tweet This v1.6.1, a WordPress plugin for Twitter.