Simulating First/Last aggregate functions in MySQL

I came across an interesting post at Xaprb.com on selecting the first and last row of a group in MySQL. This solution builds on a comment in that post about using GROUP_CONCAT to achieve this.

As a practical example of the usage of the first and last aggregate functions, consider an application converting financial data between different timeframes.

The table structure and data might look like:

CREATE TABLE EURUSD_3600 (
    datetime datetime NOT NULL,
    open float NOT NULL,
    low float NOT NULL,
    high float NOT NULL,
    close float NOT NULL,
    PRIMARY KEY (datetime)
);
INSERT INTO EURUSD_3600 SELECT '2001-01-03 00:00:00',0.9507,0.9505,0.9509,0.9506;
INSERT INTO EURUSD_3600 SELECT '2001-01-03 01:00:00',0.9506,0.9492,0.951,0.9496;
INSERT INTO EURUSD_3600 SELECT '2001-01-03 02:00:00',0.9496,0.9495,0.9509,0.9505;
INSERT INTO EURUSD_3600 SELECT '2001-01-03 03:00:00',0.9504,0.9498,0.9508,0.9499;
INSERT INTO EURUSD_3600 SELECT '2001-01-03 04:00:00',0.9499,0.9499,0.9507,0.9503;
INSERT INTO EURUSD_3600 SELECT '2001-01-03 05:00:00',0.9503,0.9503,0.9506,0.9506;
INSERT INTO EURUSD_3600 SELECT '2001-01-03 06:00:00',0.9506,0.9505,0.9507,0.9505;
INSERT INTO EURUSD_3600 SELECT '2001-01-03 07:00:00',0.9505,0.9503,0.9513,0.9509;
INSERT INTO EURUSD_3600 SELECT '2001-01-03 08:00:00',0.951,0.9468,0.951,0.9489;
INSERT INTO EURUSD_3600 SELECT '2001-01-03 09:00:00',0.949,0.9489,0.9543,0.9539;

SQL is a suitable tool to group this data into a more compact timeframe, for instance, the 2-hour timeframe. By using a formatting function on the datetime field, a standard GROUP BY SQL statement can be used to achieve this:

CONCAT(
    year(datetime), '-', 
    month(datetime), '-',
    day(datetime), ' ',  
    floor(hour(datetime) / 2) * 2, ':00:00'
)

Calculating the low and high of the period is easy, using the standard min/max aggregate functions:

SELECT
    CAST(
        CONCAT(
            year(datetime), '-',
            month(datetime), '-',
            day(datetime), ' ',
            floor(hour(datetime) / 2) * 2, ':00:00'
        ) AS DATETIME) AS datetime,
        MIN(low) AS low,
        MAX(high) AS high
FROM EURUSD_3600
GROUP BY CAST(
        CONCAT(
            year(datetime), '-',
            month(datetime), '-',
            day(datetime), ' ',
            floor(hour(datetime) / 2) * 2, ':00:00'
        ) AS DATETIME)

GROUP_CONCAT can be used for the opening(first) and close(last), the trick is to be sure to use the ORDER BY clause in the group concatenation, otherwise the returned first and last records might not be as expected:

SUBSTRING_INDEX(
    GROUP_CONCAT(CAST(open AS CHAR) ORDER BY datetime),
    ',',
    1
) AS open

SUBSTRING_INDEX(
    GROUP_CONCAT(CAST(close AS CHAR) ORDER BY datetime),
    ',',
    -1
) AS close

GROUP_CONCAT returns a comma separated string of all groups values, then SUBSTRING_INDEX is used to return the first and last values of the group. This works as long as data is properly sorted and the resulting string fits in the GROUP_CONCAT buffer.

Note however that there is a problem with the preceding code when calculating the close(last) value. SUBSTRING_INDEX looks from the end of the string, but the GROUP_CONCAT buffer has a limited size, and it will be truncated if the concatenated rows exceed the buffer size, in which case the returned value will not be the last in the group as desired.

An easy solution is to reverse the sort order and fetch the first element instead:

SUBSTRING_INDEX(
    GROUP_CONCAT(CAST(close AS CHAR) ORDER BY datetime DESC),
    ',',
    1
) AS close

MySQL will still throw a warning about truncated group_concat when there are too many rows in the source table, but this can be safely ignored for the purpose of this example.

The final query turns out as:

SELECT
    CAST(
        CONCAT(
            year(datetime), '-',
            month(datetime), '-',
            day(datetime), ' ',
            floor(hour(datetime) / 2) * 2, ':00:00'
        ) AS DATETIME) AS datetime,
    SUBSTRING_INDEX(
        GROUP_CONCAT(CAST(open AS CHAR) ORDER BY datetime),
        ',',
        1
    ) AS open,
    MIN(low) AS low,
    MAX(high) AS high,
    SUBSTRING_INDEX(
        GROUP_CONCAT(CAST(close AS CHAR) ORDER BY datetime DESC),
        ',',
        1
    ) AS close
FROM EURUSD_3600
GROUP BY CAST(
        CONCAT(
            year(datetime), '-',
            month(datetime), '-',
            day(datetime), ' ',
            floor(hour(datetime) / 2) * 2, ':00:00'
        ) AS DATETIME);
Simulating First/Last aggregate functions in MySQL