Data Science Salary Analysis Project Report
Project Overview
The objective of this project is to analyze salary trends in the data science field using a global dataset. The dataset includes attributes such as work year, experience level, job titles, salaries, employment type, remote work ratio, and company size. The primary goal is to uncover insights into salary variations across different roles, locations, experience levels, and other influencing factors.
Data Overview
Dataset Structure: The dataset contains multiple columns representing various aspects of job and salary information, such as:
Work Year: Year of the reported salary.
Experience Level: Entry-level (Junior), mid-level (Intermediate), senior (Advanced), and leadership positions.
Job Titles: Specific roles within data science (e.g., Data Scientist, Machine Learning Engineer).
Salaries: Annual compensation, both gross and net.
Employment Type: Full-time, part-time, contract, or freelance positions.
Remote Work Ratio: Degree of remote work involved.
Company Size: Small, medium, and large organizations.
Methodology
Data Preprocessing
Handling Missing Values: Identified and addressed missing entries as appropriate.
Data Cleaning: Standardized categorical labels and numerical values.
Feature Engineering:
Derived new features such as salary ranges and regional salary averages.
Exploratory Data Analysis (EDA)
Univariate Analysis:
Visualized the distribution of salaries in USD.
Counted the frequency of experience levels.
Bivariate and Multivariate Analysis:
Examined salary variations by experience level, job titles, and company size.
Analyzed regional differences in salaries for countries with more than 10 entries.
Visualization
Tools Used: Seaborn for generating insightful visualizations.
Key Visualizations:
Salary distribution in USD using boxen plots.
Count plots for experience levels.
Analysis of company locations with aggregated salary statistics (mean, max, min).
Key Findings
Salary Distribution:
Salaries in USD show significant variation across roles and locations.
Experience Level:
Senior and executive levels generally command higher salaries.
Regional Insights:
Countries with more than 10 entries were analyzed for mean, max, and min salaries.
Tools and Technologies
Programming Languages: Python
Libraries: Pandas, NumPy, Seaborn
Environment: Jupyter Notebook
Prepared by
Omar Data Scientist