This is a machine learning-based web application that classifies emails as SPAM or NOT SPAM using natural language processing (NLP). The application is built with Flask and includes a simple web interface for users to input email text and receive predictions.
project-folder/
├── app.py # Main Flask application
├── dataset/ # Folder containing the dataset
│ └── email.csv # Dataset file (spam email data)
├── spam_model.pkl # Trained ML model
├── tfidf_vectorizer.pkl # Trained TF-IDF vectorizer
├── templates/ # Folder containing HTML templates
│ └── index.html # Web interface template
└── README.md # Project documentation
- Python 3.7+
- Required Python packages:
- Flask
- scikit-learn
- pandas
- joblib
Install the required packages:
pip install flask scikit-learn pandas joblib-
Clone the repository or download the project files.
git clone https://github.com/yourusername/spam-email-checker.git cd spam-email-checker -
Place your dataset (
email.csv) in thedataset/folder. -
Train the model (if not already trained):
- Open a Python script or notebook.
- Load
email.csvfor training. - Save the trained model as
spam_model.pkland the TF-IDF vectorizer astfidf_vectorizer.pklin the project root directory.
If you already have these files (
spam_model.pklandtfidf_vectorizer.pkl), skip this step. -
Run the Flask application:
python app.py
-
Open your web browser and navigate to
http://127.0.0.1:5000.
Below is a screenshot of the web interface:
- Enter email text in the input field on the web interface.
- Click the Check button.
- View the result: The app will display whether the email is classified as SPAM or NOT SPAM.
The dataset (email.csv) should contain email text and labels (e.g., 0 for NOT SPAM and 1 for SPAM). If you're using your own dataset, ensure it is preprocessed appropriately.
The model is trained using:
- TF-IDF Vectorizer: Converts email text into numerical features.
- Logistic Regression: Used for binary classification.
- Preprocess the text by removing special characters, converting to lowercase, and tokenizing.
- Vectorize the text using TF-IDF.
- Train the Logistic Regression model.
- Save the model and vectorizer using
joblib.
- Python
- Flask: For building the web application.
- scikit-learn: For machine learning and text processing.
- HTML/CSS: For the front-end web interface.
- Improve model accuracy by exploring advanced NLP techniques (e.g., word embeddings or deep learning).
- Add more user-friendly features to the web interface.
- Deploy the application to a cloud platform (e.g., Heroku, AWS, or Google Cloud).
This project is open-source and available under the MIT License.
