Dataset:
a dataset containing more than 250k records and more than 40 features.
2. Exploratory Data Analysis (EDA):
Conducting a thorough EDA to uncover patterns, anomalies, trends, and relationships within the
data. Visualizations used to help understand the distribution of data and the
relationships between features.
3. Data Cleaning:
This covers issues like missing values, outliers, and inaccurate data entries.
4. Dimensionality Reduction:
Implement dimensionality reduction technique called PCA to reduce the number of
features while retaining helpful information.
5. SVM Model Development:
Building an SVM model, focusing on either classification or regression. The model should be
robust, and its parameters should be fine-tuned to get optimal performance. Evaluate the model
using appropriate metrics.
Deliverables:
1. Python code.
2. Presentation that includes:
i. An overview of the dataset, explaining the types and nature of features.
ii. Insights and visualizations from the EDA.
iii. Dimensionality reduction technique used.
iv. SVM training process including parameter tuning and model evaluation.