/* Data table prep work */ SELECT * FROM job_postings_fact as jpf LIMIT 10; SELECT * FROM skills_dim as sd LIMIT 10; SELECT * FROM skills_job_dim as sjd LIMIT 10; SELECT * FROM information_schema.columns WHERE table_catalog = 'data_jobs' ; SELECT * FROM information_schema.columns WHERE table_catalog = 'data_jobs' AND column_name LIKE '%id%' AND table_name IN ('skills_dim', 'job_postings_fact', 'skills_job_dim') ; /* Question: What are the most in-demand skills for data engineers? - Join job postings to inner join table similar to query 2 - Identify the top 10 in-demand skills for data engineers - Focus on remote job postings - Why? Retrieves the top 10 skills with the highest demand in the remote job market, providing insights into the most valuable skills for data engineers seeking remote work */ SELECT * FROM job_postings_fact LIMIT 10; SELECT * FROM skills_job_dim LIMIT 10; SELECT * FROM skills_dim LIMIT 10; SELECT DISTINCT (job_work_from_home) FROM job_postings_fact WHERE job_title_short LIKE '%Data%' LIMIT 10 ; SELECT sd.skills, COUNT(jpf.*) as demand_skills FROM job_postings_fact as jpf INNER JOIN skills_job_dim as sjd ON jpf.job_id = sjd.job_id INNER JOIN skills_dim as sd ON sjd.skill_id = sd.skill_id WHERE jpf.job_title_short LIKE 'Data Engineer' AND jpf.job_work_from_home = True GROUP BY sd.skills ORDER BY demand_skills DESC LIMIT 10 ; /* Data Engineering Skills — Market Summary Work-From-Home Demand Analysis Summary Analysis of 95,293 skill mentions across data engineering job postings shows a clear hierarchy: foundational languages (SQL, Python) dominate demand, followed by cloud platforms and big data tooling. Roles offering work-from-home flexibility consistently favour cloud-native skills, as these eliminate any dependency on physical infrastructure and enable fully remote workflows. Key Findings SQL (29,221) and Python (28,776) are the top two skills, making up nearly 60% of total demand — both are essential for any data engineering role. Cloud platforms (AWS, Azure, GCP) collectively account for ~40% of demand and are strongly correlated with work-from-home eligibility, as all tooling is browser/API-accessible with no on-site infrastructure needed. Big data and orchestration tools — Spark, Airflow, Snowflake, and Databricks — dominate the mid-tier, signalling that remote roles increasingly expect autonomous pipeline management. Java remains relevant at #9 (7,267 mentions), primarily for JVM-based systems like Kafka and legacy Spark environments. ┌────────────┬───────────────┐ │ skills │ demand_skills │ │ varchar │ int64 │ ├────────────┼───────────────┤ │ sql │ 29221 │ │ python │ 28776 │ │ aws │ 17823 │ │ azure │ 14143 │ │ spark │ 12799 │ │ airflow │ 9996 │ │ snowflake │ 8639 │ │ databricks │ 8183 │ │ java │ 7267 │ │ gcp │ 6446 │ ├────────────┴───────────────┤ │ 10 rows 2 columns │ └────────────────────────────┘ */