SELECT sd.skills, ROUND(MEDIAN(jpf.salary_year_avg)) as median_salary, COUNT(jpf.*) as demand_skills, COUNT(jpf.salary_year_avg) as corrected_demand_count FROM job_postings_fact as jpf INNER JOIN skills_job_dim as sjd ON jpf.job_id = sjd.job_id INNER JOIN skills_dim as sd ON sjd.skill_id = sd.skill_id WHERE jpf.job_title_short LIKE 'Data Engineer' AND jpf.job_work_from_home = False GROUP BY sd.skills HAVING COUNT(jpf.*) > 100 ORDER BY median_salary DESC LIMIT 25 ; /* ┌────────────┬───────────────┬───────────────┬────────────────────────┐ │ skills │ median_salary │ demand_skills │ corrected_demand_count │ │ varchar │ double │ int64 │ int64 │ ├────────────┼───────────────┼───────────────┼────────────────────────┤ │ rust │ 210000.0 │ 232 │ 23 │ │ golang │ 184000.0 │ 912 │ 39 │ │ terraform │ 184000.0 │ 3248 │ 193 │ │ spring │ 175500.0 │ 364 │ 33 │ │ neo4j │ 170000.0 │ 277 │ 11 │ │ gdpr │ 169616.0 │ 582 │ 22 │ │ zoom │ 168438.0 │ 127 │ 12 │ │ graphql │ 167500.0 │ 445 │ 28 │ │ mongo │ 162250.0 │ 265 │ 14 │ │ fastapi │ 157500.0 │ 204 │ 3 │ │ bitbucket │ 155000.0 │ 478 │ 9 │ │ django │ 155000.0 │ 265 │ 5 │ │ crystal │ 154224.0 │ 129 │ 3 │ │ c │ 151500.0 │ 444 │ 23 │ │ atlassian │ 151500.0 │ 249 │ 9 │ │ typescript │ 151000.0 │ 388 │ 39 │ │ kubernetes │ 150500.0 │ 4202 │ 147 │ │ node │ 150000.0 │ 179 │ 22 │ │ ruby │ 150000.0 │ 736 │ 48 │ │ css │ 150000.0 │ 262 │ 13 │ │ airflow │ 150000.0 │ 9996 │ 386 │ │ redis │ 149000.0 │ 605 │ 17 │ │ vmware │ 148798.0 │ 136 │ 2 │ │ ansible │ 148798.0 │ 475 │ 14 │ │ jupyter │ 147500.0 │ 400 │ 15 │ ├────────────┴───────────────┴───────────────┴────────────────────────┤ │ 25 rows 4 columns │ └─────────────────────────────────────────────────────────────────────┘ */ SELECT sd.skills, ROUND(MEDIAN(jpf.salary_year_avg)) as median_salary, COUNT(jpf.*) as demand_count, ROUND(LN(COUNT(jpf.*)),2) as demand_count, ROUND((LN(COUNT(jpf.*)) * MEDIAN(jpf.salary_year_avg))/1_000_000,2) as optimal_score FROM job_postings_fact as jpf INNER JOIN skills_job_dim as sjd ON jpf.job_id = sjd.job_id INNER JOIN skills_dim as sd ON sjd.skill_id = sd.skill_id WHERE jpf.job_title_short LIKE 'Data Engineer' AND jpf.job_work_from_home = True AND jpf.salary_year_avg IS NOT NULL GROUP BY sd.skills HAVING COUNT(jpf.*) > 100 ORDER BY optimal_score DESC LIMIT 25 ; /* ┌────────────┬───────────────┬──────────────┬──────────────┬───────────────┐ │ skills │ median_salary │ demand_count │ demand_count │ optimal_score │ │ varchar │ double │ int64 │ double │ double │ ├────────────┼───────────────┼──────────────┼──────────────┼───────────────┤ │ terraform │ 184000.0 │ 193 │ 5.26 │ 0.97 │ │ python │ 135000.0 │ 1133 │ 7.03 │ 0.95 │ │ aws │ 137320.0 │ 783 │ 6.66 │ 0.91 │ │ sql │ 130000.0 │ 1128 │ 7.03 │ 0.91 │ │ airflow │ 150000.0 │ 386 │ 5.96 │ 0.89 │ │ spark │ 140000.0 │ 503 │ 6.22 │ 0.87 │ │ kafka │ 145000.0 │ 292 │ 5.68 │ 0.82 │ │ snowflake │ 135500.0 │ 438 │ 6.08 │ 0.82 │ │ azure │ 128000.0 │ 475 │ 6.16 │ 0.79 │ │ java │ 135000.0 │ 303 │ 5.71 │ 0.77 │ │ scala │ 137290.0 │ 247 │ 5.51 │ 0.76 │ │ git │ 140000.0 │ 208 │ 5.34 │ 0.75 │ │ kubernetes │ 150500.0 │ 147 │ 4.99 │ 0.75 │ │ databricks │ 132750.0 │ 266 │ 5.58 │ 0.74 │ │ redshift │ 130000.0 │ 274 │ 5.61 │ 0.73 │ │ gcp │ 136000.0 │ 196 │ 5.28 │ 0.72 │ │ nosql │ 134415.0 │ 193 │ 5.26 │ 0.71 │ │ hadoop │ 135000.0 │ 198 │ 5.29 │ 0.71 │ │ pyspark │ 140000.0 │ 152 │ 5.02 │ 0.7 │ │ mongodb │ 135750.0 │ 136 │ 4.91 │ 0.67 │ │ docker │ 135000.0 │ 144 │ 4.97 │ 0.67 │ │ r │ 134775.0 │ 133 │ 4.89 │ 0.66 │ │ go │ 140000.0 │ 113 │ 4.73 │ 0.66 │ │ github │ 135000.0 │ 127 │ 4.84 │ 0.65 │ │ bigquery │ 135000.0 │ 123 │ 4.81 │ 0.65 │ ├────────────┴───────────────┴──────────────┴──────────────┴───────────────┤ │ 25 rows 5 columns │ └──────────────────────────────────────────────────────────────────────────┘ Summary This analysis examines the optimal skills for remote Data Engineer roles by combining salary and demand into a single composite score — calculated as the log of job postings multiplied by median salary. This approach rewards skills that are both well-compensated and widely requested, avoiding the trap of chasing either high pay in niche roles or high volume in lower-paying ones. The dataset covers 25 skills, each appearing in at least 100 remote job postings with a reported salary, giving the findings strong statistical grounding. The results reveal a clear tiering: Terraform, Python, AWS, and SQL occupy the top cluster with optimal scores between 0.91–0.97, driven by strong salary floors ($130K–$184K) and massive demand. Below them sits a rich mid-tier — Airflow, Spark, Kafka, Snowflake — where slightly lower demand is offset by above-average salaries, particularly in streaming and orchestration. The bottom cluster (Docker, Go, GitHub, BigQuery) still commands solid $135K+ medians but trails on demand volume, making them valuable secondary skills rather than primary targets for career positioning. Key Findings Terraform is the highest-value single skill with the top optimal score (0.97) and the highest median salary of any skill in the dataset at $184,000 — nearly $50K above the group average. Despite relatively modest demand (193 postings), its salary premium is so pronounced that it outscores even Python and SQL. Infrastructure-as-code expertise is rare, commands a significant wage premium, and is directly suited to remote work since all provisioning is done via CLI and APIs. Python and SQL are the volume anchors of the market, each appearing in over 1,100 remote postings — more than double any other skill — and both scoring 0.91–0.95. Their median salaries ($135K and $130K respectively) are solid but not exceptional; their dominance comes from ubiquity. For anyone entering the field, these two skills represent the lowest-risk, highest-return investment — nearly every role expects them. AWS leads the cloud platforms, outscoring Azure and GCP by a notable margin (0.91 vs. 0.79 and 0.72). All three cloud providers sit in the top half of the table, but AWS uniquely combines strong demand (783 postings) with the highest cloud median salary ($137,320). This reflects AWS's continued dominance in enterprise data infrastructure and its tight integration with modern data stacks. Azure and GCP remain important but are stronger as complementary skills. Streaming and orchestration tools (Kafka, Airflow, Spark) offer a high salary-to-demand ratio, clustering between $140K–$150K median salaries with moderate but healthy demand (292–503 postings each). These are the skills most likely to differentiate a mid-career engineer, signalling the ability to manage real-time pipelines and complex DAG-based workflows autonomously — exactly the profile remote-first teams are hiring for. Infrastructure and containerisation skills (Kubernetes, Git, Docker) punch above their demand weight on salary — Kubernetes in particular has the second-highest median salary in the dataset at $150,500, despite appearing in only 147 postings. This niche-but-lucrative pattern suggests that DevOps-adjacent data engineers who can manage containerised workloads command a meaningful premium, even if the absolute number of such roles is smaller. These are strong specialisation targets for engineers already solid in Python/SQL/cloud. */