129 lines
3.4 KiB
SQL
129 lines
3.4 KiB
SQL
/*
|
|
Data table prep work
|
|
|
|
*/
|
|
|
|
SELECT
|
|
*
|
|
FROM job_postings_fact as jpf
|
|
LIMIT 10;
|
|
|
|
SELECT
|
|
*
|
|
FROM skills_dim as sd
|
|
LIMIT 10;
|
|
|
|
SELECT
|
|
*
|
|
FROM skills_job_dim as sjd
|
|
LIMIT 10;
|
|
|
|
SELECT
|
|
*
|
|
FROM information_schema.columns
|
|
WHERE table_catalog = 'data_jobs'
|
|
;
|
|
|
|
SELECT
|
|
*
|
|
FROM information_schema.columns
|
|
WHERE table_catalog = 'data_jobs'
|
|
AND
|
|
column_name LIKE '%id%'
|
|
AND table_name IN ('skills_dim', 'job_postings_fact', 'skills_job_dim')
|
|
;
|
|
|
|
/*
|
|
Question: What are the most in-demand skills for data engineers?
|
|
- Join job postings to inner join table similar to query 2
|
|
- Identify the top 10 in-demand skills for data engineers
|
|
- Focus on remote job postings
|
|
- Why? Retrieves the top 10 skills with the highest demand in the remote job market,
|
|
providing insights into the most valuable skills for data engineers seeking remote work
|
|
|
|
*/
|
|
|
|
SELECT
|
|
*
|
|
FROM job_postings_fact
|
|
LIMIT 10;
|
|
|
|
|
|
SELECT
|
|
*
|
|
FROM skills_job_dim
|
|
LIMIT 10;
|
|
|
|
SELECT *
|
|
FROM skills_dim
|
|
LIMIT 10;
|
|
|
|
|
|
SELECT
|
|
DISTINCT (job_work_from_home)
|
|
FROM
|
|
job_postings_fact
|
|
WHERE
|
|
job_title_short LIKE '%Data%'
|
|
LIMIT 10
|
|
;
|
|
|
|
|
|
SELECT
|
|
sd.skills,
|
|
COUNT(jpf.*) as demand_skills
|
|
FROM job_postings_fact as jpf
|
|
INNER JOIN skills_job_dim as sjd
|
|
ON jpf.job_id = sjd.job_id
|
|
INNER JOIN skills_dim as sd
|
|
ON sjd.skill_id = sd.skill_id
|
|
WHERE
|
|
jpf.job_title_short LIKE 'Data Engineer'
|
|
AND
|
|
jpf.job_work_from_home = True
|
|
|
|
GROUP BY sd.skills
|
|
ORDER BY
|
|
demand_skills DESC
|
|
LIMIT 10
|
|
;
|
|
|
|
/*
|
|
Data Engineering Skills — Market Summary
|
|
Work-From-Home Demand Analysis
|
|
|
|
Summary
|
|
|
|
Analysis of 95,293 skill mentions across data engineering job postings shows a clear hierarchy: foundational languages (SQL, Python) dominate demand, followed by cloud platforms and big data tooling. Roles offering work-from-home flexibility consistently favour cloud-native skills, as these eliminate any dependency on physical infrastructure and enable fully remote workflows.
|
|
Key Findings
|
|
|
|
SQL (29,221) and Python (28,776) are the top two skills, making up nearly 60% of total demand — both are essential for any data engineering role.
|
|
Cloud platforms (AWS, Azure, GCP) collectively account for ~40% of demand and are strongly correlated with work-from-home eligibility, as all tooling is browser/API-accessible with no on-site infrastructure needed.
|
|
Big data and orchestration tools — Spark, Airflow, Snowflake, and Databricks — dominate the mid-tier, signalling that remote roles increasingly expect autonomous pipeline management.
|
|
Java remains relevant at #9 (7,267 mentions), primarily for JVM-based systems like Kafka and legacy Spark environments.
|
|
|
|
┌────────────┬───────────────┐
|
|
│ skills │ demand_skills │
|
|
│ varchar │ int64 │
|
|
├────────────┼───────────────┤
|
|
│ sql │ 29221 │
|
|
│ python │ 28776 │
|
|
│ aws │ 17823 │
|
|
│ azure │ 14143 │
|
|
│ spark │ 12799 │
|
|
│ airflow │ 9996 │
|
|
│ snowflake │ 8639 │
|
|
│ databricks │ 8183 │
|
|
│ java │ 7267 │
|
|
│ gcp │ 6446 │
|
|
├────────────┴───────────────┤
|
|
│ 10 rows 2 columns │
|
|
└────────────────────────────┘
|
|
|
|
*/
|
|
|
|
|
|
|
|
|
|
|
|
|