The film industry faces major financial uncertainty, making revenue forecasting a crucial tool for budgeting and marketing decisions. This project aimed to predict box office revenue based on a movie’s attributes using Machine Learning in Python.
Implementation Details:
Collected and cleaned a dataset of 5,000+ movies containing budget, runtime, genre, and audience ratings using Pandas.
Conducted exploratory data analysis (EDA) to identify relationships between key features and revenue.
Engineered new features (e.g., ROI, genre encoding) to improve model accuracy.
Trained and compared multiple ML models: Linear Regression, Random Forest, and Decision Tree using Scikit-learn.
Evaluated model performance with R² score, RMSE, and cross-validation to ensure reliability.
Visualized insights and model outputs using Matplotlib and Seaborn.
Outcome:
Achieved an R² score of 0.85 with the Random Forest model. The project provided clear insights into which factors most influence movie revenue, supporting smarter financial planning.
Tools Used:
Python, Pandas, Scikit-learn, Matplotlib, Seaborn, NumPy, Data Cleaning, Predictive Modeling