Oracle Analytics - I wish there was one more aggregate function...
I'm a huge fan of Oracle's analytic functions and try to use them in my SQL whenever I can. I do however wish there was just one more tool in this very neat toolbox, MOST_COMMON().
The toolbox currently consists of the following functions, I've created a test table (DDL and inserts below) for those of us who are visual learners and like to see examples. Please note, these are very simple examples and I'd recommend you doing some Google-Kung-Fu to learn more about them if you find them interesting:
LAG and LEAD: LAG helps you get the value of the field from the previous row and LEAD allows access to the value of the next row. Here's an example but
FIRST_VALUE & LAST_VALUE: These functions allow you to get the First and Last values in a set.
Rank & Dense Rank: Using the Oracle Rank function allows assign a number to the row illustrating the sequence or "rank" of that record. Used in aggregate functions, the most common value would have a rank of 1 and the second 2, etc. Dense Rank ensures the values are consecutive, so if you had 2 values with a rank of #2, the next dense rank value would have a value of 3 (where rank would have 4). Take a look at the example:
Other Oracle functions which should be in your SQL Toolbox include: MIN, MAX, SUM, AVG, COUNT, RATIO_TO_REPORT, VARIANCE, and STDDEV.
The one that's missing? MOST_COMMON. I wish while using the other aggregate functions there was a function which would give you the most common value, for example:
I'm not sure off hand what the function should do if there are more than one most common values, obviously this would take a little bit deeper thinking than what I've done here. Yesterday I had to do something like this and I cobbled together a big-old SQL statement which used sub-selects and the RANK function, it was really ugly but it worked.
We'll see if Oracle enhances it's analytic functions in the future, perhaps Larry E. will read this and the team there will implement my MOST_COMMON function, maybe I'll get a sailboat ride out of the deal.
Until next time...Rich
The toolbox currently consists of the following functions, I've created a test table (DDL and inserts below) for those of us who are visual learners and like to see examples. Please note, these are very simple examples and I'd recommend you doing some Google-Kung-Fu to learn more about them if you find them interesting:
SELECT id,
str,
LAG(str) OVER (ORDER BY id) lag_str,
LEAD(str) OVER (ORDER BY id) lead_str
FROM test_stuff
ORDER BY id;
ID STR LAG_STR LEAD_STR
---- ------------ ------------ ------------
1 Rich Richard
2 Richard Rich Richie
3 Richie Richard Rich
4 Rich Richie Richard
5 Richard Rich Richie
6 Richie Richard Rick
7 Rick Richie Ricker
8 Ricker Rick Rich
9 Rich Ricker Richie Rich
10 Richie Rich Rich
10 rows selected.
SELECT id,
str,
FIRST_VALUE(str) over () first_str,
LAST_VALUE(str) over () last_str
FROM test_stuff
ORDER BY id;
ID STR FIRST_STR LAST_STR
---- ------------ ------------ -------------
1 Rich Rich Richie Rich
2 Richard Rich Richie Rich
3 Richie Rich Richie Rich
4 Rich Rich Richie Rich
5 Richard Rich Richie Rich
6 Richie Rich Richie Rich
7 Rick Rich Richie Rich
8 Ricker Rich Richie Rich
9 Rich Rich Richie Rich
10 Richie Rich Rich Richie Rich
10 rows selected.
SELECT str,
COUNT(*) records,
RANK() OVER (order by COUNT(*) DESC) ranking,
DENSE_RANK() OVER (order by COUNT(*) DESC) DENSE_ranking
FROM test_stuff
GROUP BY str
ORDER BY COUNT(*) DESC;
STR RECORDS RANKING DENSE_RANKING
------------ -------------- -------------- --------------
Rich 3 1 1
Richie 2 2 2
Richard 2 2 2
Richie Rich 1 4 3
Rick 1 4 3
Ricker 1 4 3
--------------
sum 10
6 rows selected.
Other Oracle functions which should be in your SQL Toolbox include: MIN, MAX, SUM, AVG, COUNT, RATIO_TO_REPORT, VARIANCE, and STDDEV.
The one that's missing? MOST_COMMON. I wish while using the other aggregate functions there was a function which would give you the most common value, for example:
SELECT MOST_COMMON(str) STR
FROM test_stuff;
STR
------------
Rich
I'm not sure off hand what the function should do if there are more than one most common values, obviously this would take a little bit deeper thinking than what I've done here. Yesterday I had to do something like this and I cobbled together a big-old SQL statement which used sub-selects and the RANK function, it was really ugly but it worked.
We'll see if Oracle enhances it's analytic functions in the future, perhaps Larry E. will read this and the team there will implement my MOST_COMMON function, maybe I'll get a sailboat ride out of the deal.
Until next time...Rich
CREATE TABLE test_stuff (
id NUMBER PRIMARY KEY,
str VARCHAR2(30));
INSERT INTO test_stuff (id, str) VALUES (1, 'Rich');
INSERT INTO test_stuff (id, str) VALUES (2, 'Richard');
INSERT INTO test_stuff (id, str) VALUES (3, 'Richie');
INSERT INTO test_stuff (id, str) VALUES (4, 'Rich');
INSERT INTO test_stuff (id, str) VALUES (5, 'Richard');
INSERT INTO test_stuff (id, str) VALUES (6, 'Richie');
INSERT INTO test_stuff (id, str) VALUES (7, 'Rick');
INSERT INTO test_stuff (id, str) VALUES (8, 'Ricker');
INSERT INTO test_stuff (id, str) VALUES (9, 'Rich');
INSERT INTO test_stuff (id, str) VALUES (10, 'Richie Rich');
COMMIT;
Comments
SQL> select id, str, max( case when cnt_ = max_cnt then str end ) over() most_frequent
2 from (
3 select id, str, cnt_, max(cnt_) over() max_cnt
4 from (
5 select id, str, count(*) over(partition by str) cnt_
6 from test_stuff
7 )
8 )
9 /
ID STR MOST_FREQUENT
---------- ------------------------------ ------------------------------
4 Rich Rich
1 Rich Rich
9 Rich Rich
2 Richard Rich
5 Richard Rich
6 Richie Rich
3 Richie Rich
10 Richie Rich Rich
7 Rick Rich
8 Ricker Rich
10 rows selected
STATS_MODE()