Performed data cleaning and preprocessing on YouTube video data using Python, Pandas, NumPy, and Regex
Handled missing values, duplicates, invalid video IDs, data type conversion, and feature extraction from publishing dates
Standardized and cleaned text-based features such as video tags, titles, and languages
Conducted exploratory data analysis (EDA) using Matplotlib and Seaborn to identify patterns in views, likes, comments, tags, languages, and channel performance
Built multiple visualizations including bar charts, scatter plots, heatmaps, line charts, pie charts, and word clouds to highlight engagement trends
Engineered new features such as log-transformed engagement metrics, number of tags, day of publication, and yearly trends
Developed machine learning classification models to predict video success and engagement levels using Random Forest and Logistic Regression
Evaluated model performance using accuracy score and classification metrics
Tools & Technologies
Python, Pandas, NumPy, Matplotlib, Seaborn, Scikit-learn, Regex, WordCloud