본문 바로가기
프로그래밍/SQL, Hive, SAS 관련 정보

[SQL문제풀이-Union활용] Count Salary Categories

by TrillionNT 2025. 4. 9.

2025.02.24 - [프로그래밍/SQL, Hive, SAS 관련 정보] - [SQL] Table of Contents


https://leetcode.com/problems/count-salary-categories/solutions/4784326/easy-and-all-in-1-code-mysql-ms-sql-server-oracle-postgresql/?envType=study-plan-v2&envId=top-sql-50

 

항상 열을 기준으로 연산을 하는 데 익숙해져 있었더니 행 단위로 조작을 가해줘야 하는 상황에서 (어렵지 않은 문제임에도) 버벅이는 경우였습니다. 

with temp as 
    (select account_id, case when income<20000 then 'Low Salary'
                            when income<=50000 then 'Average Salary'
                            when income>50000 then 'High Salary' end category
    from Accounts)
select category, count(*) accounts_count from temp where category='Low Salary'
union
select category, count(*) accouns_count from temp where category='Average Salary'
union
select category, count(*) accouns_count from temp where category='High Salary'

 

이렇게 코드를 짜면 카테고리에 해당되는 값이 0개일 때 category 명이 제대로 출력되지 않습니다. 그래서 카테고리를 필터링한 후에 카테고리를 수동지정해주는 방법을 써야합니다. 

 

SELECT 'Low Salary' AS category, COUNT(*) AS accounts_count 
FROM Accounts
WHERE income < 20000

UNION

SELECT 'Average Salary' AS category, COUNT(*) AS accounts_count 
FROM Accounts
WHERE income BETWEEN 20000 AND 50000

UNION

SELECT 'High Salary' AS category, COUNT(*) AS accounts_count 
FROM Accounts
WHERE income > 50000

 

 


Pandas로 풀면 어떻게 될까요? 

저는 일단 모든 카테고리가 들어가있는 기준 데이터프레임을 만들고 left join 시켰습니다. 

def count_salary_categories(accounts: pd.DataFrame) -> pd.DataFrame:
    def classify(salary):
        if salary<20000:
            return 'Low Salary'
        if 20000<=salary<=50000:
            return 'Average Salary'
        else:
            return 'High Salary'
    accounts['category'] = accounts['income'].apply(classify)
    backbone = pd.DataFrame({'category':['High Salary', 'Low Salary', 'Average Salary']})
    return backbone.merge(accounts.groupby(['category'], as_index=False)['account_id'].count().rename({'account_id':'accounts_count'},axis=1), how='left', on='category').fillna(0)

 

그런데, 답안에는 굉장히 간단한 방법으로 쉽게 해결하신 분들도 보이더라구요

def count_salary_categories(accounts: pd.DataFrame) -> pd.DataFrame:
    # Calculate the counts for each salary category
    low_salary_count = accounts[accounts['income'] < 20000].shape[0]
    average_salary_count = accounts[(accounts['income'] >= 20000) & (accounts['income'] <= 50000)].shape[0]
    high_salary_count = accounts[accounts['income'] > 50000].shape[0]
    
    # Create a DataFrame to store the result
    result_df = pd.DataFrame({
        'category': ['High Salary', 'Low Salary', 'Average Salary'],
        'accounts_count': [high_salary_count, low_salary_count, average_salary_count]
    })
    
    return result_df

 

- Leetcode Sri Ganesh Rachapudi님 답안 중