SQL for two-way frequency table using a string valued variable

Question

Consider the following event data in PostgreSQL 9.4:

      eventTime     | eventName
2015-09-25 18:00:00 | 'AAA'
2015-09-25 17:00:00 | 'BBB'
2015-09-25 16:00:00 | 'BBB'
2015-09-25 15:00:00 | 'BBB'
2015-09-25 14:00:00 | 'AAA'
2015-09-26 13:00:00 | 'CCC'
2015-09-26 12:00:00 | 'AAA'
2015-09-26 11:00:00 | 'BBB'
2015-09-26 10:00:00 | 'CCC'
2015-09-26 09:00:00 | 'BBB'
2015-09-27 08:00:00 | 'AAA'
2015-09-27 07:00:00 | 'CCC'
2015-09-27 05:00:00 | 'CCC'
2015-09-27 04:00:00 | 'CCC'
2015-09-27 03:00:00 | 'CCC'
2015-09-27 02:00:00 | 'AAA'

While single count() based tables are straightforward, for example:

SELECT eventTime, count(1)
  from (SELECT data->>'eventName' as eventName,
        date_trunc('day', to_timestamp(data->>'timestamp','YYYY-MM-DDZHH24:MI:SS.MS')::timestamp without time zone) AS eventTime
        FROM sidetrack where (data->>'eventName' = 'AAA') IS TRUE) AS tmptab
GROUP BY eventTime
ORDER BY eventTime ASC

It's only possible to count the occurrence of a single value of eventName. I'm not very experienced with SQL and am struggling to find a way to create a two-way frequency table. In this example the result would be:

     day    | 'AAA' | 'BBB' | 'CCC'
------------+-------+-------+-------
 2015-09-25 |    2  |    3  |    0
 2015-09-26 |    1  |    2  |    2
 2015-09-27 |    2  |    0  |    4

There are examples where variables with numeric values are counted using with_bucket(), but that doesn't generalize to string valued fields.

I've tried nested selects under WITH, such as:

WITH
    foo AS (
        SELECT ...
    ),
    bar AS (
        SELECT ...
    FROM foo
    ),
SELECT *
FROM bar;

And with outer JOINS, but I can't crack this.

Your query extracts from JSON values, but your question doesn't even mention JSON. Please clarify and provide the actual table definition and, as always, your version of Postgres. — Erwin Brandstetter, Dec 20 '15 at 01:36
Hello Erwin, you are correct that the fact that text is being extracted from JSON keys is independent of the frequency table challenge. I'm prototyping with a test table that doesn't have JSON. I used the snippt with the JSON in it as a working example. The version of Postgres is 9.4. Thanks. — JohnH, Dec 21 '15 at 19:20

score 1 · Answer 1 · answered Dec 19 '15 at 23:13

You could just use a CASE statement for each column to generate a 1 for a match, then SUM all the rows up, something like;

SELECT date_trunc('day', timestamp) AS time,
       SUM(CASE WHEN "eventName" = 'AAA' THEN 1 ELSE 0 END) AAA,
       SUM(CASE WHEN "eventName" = 'BBB' THEN 1 ELSE 0 END) BBB,
       SUM(CASE WHEN "eventName" = 'CCC' THEN 1 ELSE 0 END) CCC
FROM sidetrack
GROUP BY date_trunc('day', timestamp)
ORDER BY date_trunc('day', timestamp) ASC

An SQLfiddle to test with.

Thank you; this works well when the number of values is known in advance. This is often the case. — JohnH, Dec 21 '15 at 19:23

score 1 · Answer 2 · answered Dec 19 '15 at 23:24

This query selects count of events on each day:

select day, event, count(*)
from (
    select 
        left(date_trunc('day', (data->>'timestamp')::timestamp)::text, 10) as day, 
        data->>'eventName' as event
    from sidetrack
    ) s
group by 1, 2
order by 1 asc, 2;

    day     | event | count 
------------+-------+-------
 2015-09-25 | AAA   |     2
 2015-09-25 | BBB   |     3
 2015-09-26 | AAA   |     1
 2015-09-26 | BBB   |     2
 2015-09-26 | CCC   |     2
 2015-09-27 | AAA   |     2
 2015-09-27 | CCC   |     4
(7 rows)

You can use the query in crosstab() function:

create extension if not exists tablefunc;

select * from crosstab (
    $q$
        select day, event, count(*)
        from (
            select 
                left(date_trunc('day', (data->>'timestamp')::timestamp)::text, 10) as day, 
                data->>'eventName' as event
            from sidetrack
            ) s
        group by 1, 2
        order by 1 asc
    $q$,
    $q$
        values ('AAA'), ('BBB'), ('CCC')
    $q$)
as ct (day text, "AAA" int, "BBB" int, "CCC" int);

    day     | AAA | BBB | CCC 
------------+-----+-----+-----
 2015-09-25 |   2 |   3 |    
 2015-09-26 |   1 |   2 |   2
 2015-09-27 |   2 |     |   4
(3 rows)

If you do not know the values like AAA,BBB,CCC in advance then how do you code it? — Geoffrey Anderson, May 25 '18 at 13:14
@GeoffreyAnderson - You can try this: [Automatically creating pivot table column names in PostgreSQL.](http://www.cureffi.org/2013/03/19/automatically-creating-pivot-table-column-names-in-postgresql/) — klin, May 25 '18 at 13:31

SQL for two-way frequency table using a string valued variable

2 Answers2