MySQL-View-querying-with-CASE-statement-and-nulls

Hackerrank – SQL – Advanced Select – Occupations

Publish Date - December 28th, 2022

|

Last Modified - December 29th, 2022

Another interesting query, with very unique criteria that you need to pay attention to. There’s multiple was you can attack this, but the most common way seems to be creation of a CTE, VIEW or function and using ROW_NUMBER(), PARTITION() to help number and organize the rows. If you are using MSSQL, then you can just use the PIVOT function :). Personally, I found this one much harder than “The pads” solution due to the way the output needed to be arranged.

The Problem

Pivot the Occupation column in OCCUPATIONS so that each Name is sorted alphabetically and displayed underneath its corresponding Occupation. The output column headers should be DoctorProfessorSinger, and Actor, respectively.

Note: Print NULL when there are no more names corresponding to an occupation.

Input Format

The OCCUPATIONS table is described as follows:

Sample Input

Sample-output-occupation-table

Sample Output

Jenny    Ashley     Meera  Jane
Samantha Christeen  Priya  Julia
NULL     Ketty      NULL   Maria

Explanation

The first column is an alphabetically ordered list of Doctor names.
The second column is an alphabetically ordered list of Professor names.
The third column is an alphabetically ordered list of Singer names.
The fourth column is an alphabetically ordered list of Actor names.
The empty cell data for columns with less than the maximum number of names per occupation (in this case, the Professor and Actor columns) are filled with NULL values.

The Solution

Firstly, this is the solution output that you want:

Aamina Ashley Christeen Eve
Julia Belvet Jane Jennifer
Priya Britney Jenny Ketty
NULL Maria Kristeen Samantha
NULL Meera NULL NULL
NULL Naomi NULL NULL
NULL Priyanka NULL NULL

This is extremely important to note, because you can only have NULLs for the maximum amount of the highest occupation category (in this case it’s Professor).

My Query

CREATE VIEW pq AS (
    SELECT 
        CASE WHEN occupation = 'Doctor' THEN name END AS 'Doctor',
        CASE WHEN occupation = 'Professor' THEN name END AS 'Professor',
        CASE WHEN occupation = 'Singer' THEN name END AS  'Singer',
        CASE WHEN occupation = 'Actor' THEN name END AS  'Actor',
        ROW_NUMBER() OVER (PARTITION BY occupation ORDER BY name) as cr
    FROM occupations
);

SELECT MAX(Doctor),MAX(Professor),MAX(Singer),MAX(Actor) FROM pq 
GROUP BY cr

I created a VIEW (because I prefer views), but you could have made a procedure or function all the same. The reason you do this is so that you don’t need to nest the MAX functions within your case statement (it gets really messy) and you’ll need a GROUP BY in your nested statement which messes everything up.

CASE WHEN occupation = 'Doctor' THEN name END AS 'Doctor',
CASE WHEN occupation = 'Professor' THEN name END AS 'Professor',
CASE WHEN occupation = 'Singer' THEN name END AS  'Singer',
CASE WHEN occupation = 'Actor' THEN name END AS  'Actor',

Here’s where the pivoting is actually done. Basically, when a row value equals a particular occupation – then the name should show up. CASE allows you to basically build new columns based on a condition. You can THEN, pull in the relevant values based on the aforementioned criteria set.

This may seem like it’s the end, if you run just your case statement alone you’ll get the following:

   SELECT 
        CASE WHEN occupation = 'Doctor' THEN name END AS 'Doctor',
        CASE WHEN occupation = 'Professor' THEN name END AS 'Professor',
        CASE WHEN occupation = 'Singer' THEN name END AS  'Singer',
        CASE WHEN occupation = 'Actor' THEN name END AS  'Actor'
    FROM occupations

Output is:

NULL Ashley NULL NULL
NULL NULL NULL Samantha
Julia NULL NULL NULL
NULL Britney NULL NULL
NULL Maria NULL NULL
NULL Meera NULL NULL
Priya NULL NULL NULL
NULL Priyanka NULL NULL
NULL NULL NULL Jennifer
NULL NULL NULL Ketty
NULL Belvet NULL NULL
NULL Naomi NULL NULL
NULL NULL Jane NULL
NULL NULL Jenny NULL
NULL NULL Kristeen NULL
NULL NULL Christeen NULL
NULL NULL NULL Eve
Aamina NULL NULL NULL

As you can see, there’s:

  1. Too many rows (a lot of NULLs)
  2. It’s not ordered alphabetically (as requested)

Try as you might, it’s impossible to order these rows into something similar to the desired outcome above. On-top of that, you have way too many rows (recall you should only have as many rows as the maximum amount of values per an occupation).

This is where the most important portion of the query comes in:

ROW_NUMBER() OVER (PARTITION BY occupation ORDER BY name) as cr

If you were just to run this by itself you would get the following:

CREATE VIEW pq AS ( 
ROW_NUMBER() OVER (PARTITION BY occupation ORDER BY name) as cr
    FROM occupations );

SELECT * FROM pq 
ORDER BY cr

Returns this:

1
1
1
1
2
2
2
2
3
3
3
3
4
4
4
5
6
7

Which is essentially what you want, an ordered list from 1 – 7 (A – Z) and you’ll have 7 values.

If I GROUP BY cr instead of ORDER BY, I’ll get

1
2
3
4
5
6
7

An alphabetically set of rows with the maximum amount of values.

Now, if you put my entire view together and just query everything, you’ll get the following:

CREATE VIEW pq AS (
    SELECT 
        CASE WHEN occupation = 'Doctor' THEN name END AS 'Doctor',
        CASE WHEN occupation = 'Professor' THEN name END AS 'Professor',
        CASE WHEN occupation = 'Singer' THEN name END AS  'Singer',
        CASE WHEN occupation = 'Actor' THEN name END AS  'Actor',
        ROW_NUMBER() OVER (PARTITION BY occupation ORDER BY name) as cr
    FROM occupations
);

SELECT * FROM pq

Output:

NULL NULL NULL Eve 1
NULL NULL NULL Jennifer 2
NULL NULL NULL Ketty 3
NULL NULL NULL Samantha 4
Aamina NULL NULL NULL 1
Julia NULL NULL NULL 2
Priya NULL NULL NULL 3
NULL Ashley NULL NULL 1
NULL Belvet NULL NULL 2
NULL Britney NULL NULL 3
NULL Maria NULL NULL 4
NULL Meera NULL NULL 5
NULL Naomi NULL NULL 6
NULL Priyanka NULL NULL 7
NULL NULL Christeen NULL 1
NULL NULL Jane NULL 2
NULL NULL Jenny NULL 3
NULL NULL Kristeen NULL 4

Now, if I “ORDER BY cr”, I will see this:

Aamina NULL NULL NULL 1
NULL NULL Christeen NULL 1
NULL Ashley NULL NULL 1
NULL NULL NULL Eve 1
NULL NULL NULL Jennifer 2
NULL NULL Jane NULL 2
Julia NULL NULL NULL 2
NULL Belvet NULL NULL 2
NULL Britney NULL NULL 3
NULL NULL Jenny NULL 3
Priya NULL NULL NULL 3
NULL NULL NULL Ketty 3
NULL Maria NULL NULL 4
NULL NULL NULL Samantha 4
NULL NULL Kristeen NULL 4
NULL Meera NULL NULL 5
NULL Naomi NULL NULL 6
NULL Priyanka NULL NULL 7

Not exactly what I want. A quick reminder that we want is this:

Aamina Ashley Christeen Eve
Julia Belvet Jane Jennifer
Priya Britney Jenny Ketty
NULL Maria Kristeen Samantha
NULL Meera NULL NULL
NULL Naomi NULL NULL
NULL Priyanka NULL NULL

You obviously want to use a GROUP BY, so that you can cluster the names by occupation with respect to your row_number() partition (which will be ordered by as well). However, you can’t do that unless you have some sort of function. This is why you use the MAX function for each column.

SELECT MAX(Doctor),MAX(Professor),MAX(Singer),MAX(Actor) FROM pq 
GROUP BY cr

This does two things:

  1. It provides you with a way to group the relevant occupations by name (since you already have an order by within your partition)
  2. Enables you to group by.

Conclusion

While this may have seemed simple, it definitely was not! This was one of the harder solutions I’ve come across to solve. This is the end of this little tutorial, I hope you learned how to do some advanced SQL queries! Keep on learning!

Feel free to check out my articles on related technologies:

Leave a Comment

Your email address will not be published. Required fields are marked *