This repository contains the code and data for a predictive modeling project focused on analyzing car data and building a regression model to predict car prices based on various specifications.
The objective of this project is to identify the factors that significantly impact the price of a car and develop a regression model capable of accurately predicting car prices given specific specifications. The dataset used in this project contains detailed information about cars and their corresponding prices.
To view the dataset, click here.
Throughout the project, the following tasks were performed:
-
Data Wrangling: Python's Pandas and NumPy libraries were employed to optimize the data for analysis, ensuring data integrity. The data wrangling process involved cleaning, transforming, and preprocessing the dataset to make it suitable for subsequent analysis. The cleaned dataset can be found here.
-
Exploratory Data Analysis (EDA): EDA techniques were applied to gain insights into the data and identify correlations between variables. By analyzing the relationships between different features and the target variable (car price), key variables for regression modeling were effectively selected.
-
Regression Model Development: Python's Scikit-learn library was utilized to develop a regression model capable of accurately predicting future car price values. The Jupyter Notebook containing the code for the regression model can be accessed here.
The following files are included in this repository:
-
predictive_modeling.ipynb: Jupyter Notebook containing the code for the regression model development.
-
auto.csv: The original car dataset used for analysis.
-
clean_auto_data.csv: The cleaned and preprocessed car dataset optimized for analysis.
-
predictive_modeling_project.html: HTML version of the Jupyter Notebook containing the regression model code and outputs.
Feel free to explore these files to gain a better understanding of the project and the steps involved in building the predictive model.