🧠 GPT from Scratch — Based on "Build a Large Language Model (From Scratch)"

This repository contains an implementation of a GPT-style language model from scratch, following the book "Build a Large Language Model (From Scratch)" by Sebastian Raschka. It includes training code, utilities, and a notebook to load GPT-2 weights from OpenAI.

image credits

📖 Project Overview

Goal: Train a GPT-style transformer model from scratch on a real-world dataset.
Data: The Verdict by Alice Meynell (public domain).
Framework: PyTorch
Tokenizer: GPT-2 BPE tokenizer via tiktoken
Model: GPT-124M architecture defined from first principles
Training: Simple training loop with loss tracking and validation
Extras: Jupyter notebook to load and compare against GPT-2 weights from OpenAI

📁 Project Structure

.
├── GPT/ # GPT model definition and config
│ ├── GPT.py
│ └── GPT_CONFIG.py
├── data/ # Training data
│ └── the-verdict.txt
├── data_loader/ # Data preprocessing and batching
│ └── data_loader.py
├── loss/ # Custom loss functions for GPT training
│ └── loss.py
├── networks/ # Transformer block components (e.g., LayerNorm, attention)
│ └── networks.py
├── tools/ # Training loop and related utilities
│ └── program.py
├── utils/ # Utility functions
│ ├── utils.py
│ └── plot.py
├──gpt2.ipynb # Jupyter notebook
├── train.py # Main training script
├── README.md
└── requirements.txt

🚀 Getting Started

1. Clone the repository

git clone https://github.com/dimitri009/GPT.git
cd GPT

2. Install dependencies

I recommend using a virtual environment:

pip install -r requirements.txt

This project uses torch, tiktoken, matplotlib, and numpy.

3. Prepare the data

The dataset is already included:

data/the-verdict.txt

If you want to download a fresh copy:

wget https://en.wikisource.org/wiki/The_Verdict -O data/the-verdict.txt

4. Train the model

python train.py

Training will log loss over epochs, plot the curves at the end and save the plot in fig/. You can customize hyperparameters in train.py and GPT/GPT_CONFIG.py.

5. Load GPT-2 Weights (Optional)

To load pretrained OpenAI GPT-2 weights and compare against your model, use the notebook:

jupyter notebook gpt2.ipynb

🧪 Features

Manual implementation of GPT-2 style attention layers

Positional embeddings

Token embeddings using tiktoken

Train-validation splitting and loss plotting

Option to use drop_last, stride, and context length control in dataloader

GPT-2 weight loading notebook for comparison

📊 Example Output

Training loss and validation loss are plotted after training. Example:

📚 References

Book: Build a Large Language Model (From Scratch) by Sebastian Raschka

Tokenizer: OpenAI tiktoken

Dataset: The Verdict - Wikisource

🛠️ TODO

Add script for classifier fine-tuning (e.g., sentiment classification, topic classification)
Add script for instruction fine-tuning (e.g., Q&A or task completion using prompt–response format)

🧑‍💻 License

This project is released under the MIT License. See LICENSE for details.

🤝 Acknowledgements

Sebastian Raschka for his detailed book and educational resources
OpenAI for GPT-2 and the tiktoken tokenizer
Project Gutenberg / Wikisource for public domain texts

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

🧠 GPT from Scratch — Based on "Build a Large Language Model (From Scratch)"

📖 Project Overview

📁 Project Structure

🚀 Getting Started

1. Clone the repository

2. Install dependencies

3. Prepare the data

4. Train the model

5. Load GPT-2 Weights (Optional)

🧪 Features

📊 Example Output

📚 References

🛠️ TODO

🧑‍💻 License

🤝 Acknowledgements

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
GPT		GPT
data		data
data_laoder		data_laoder
fig		fig
loss		loss
networks		networks
tools		tools
utils		utils
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
gpt2.ipynb		gpt2.ipynb
requirements.txt		requirements.txt
train.py		train.py

License

dimitri009/GPT

Folders and files

Latest commit

History

Repository files navigation

🧠 GPT from Scratch — Based on "Build a Large Language Model (From Scratch)"

📖 Project Overview

📁 Project Structure

🚀 Getting Started

1. Clone the repository

2. Install dependencies

3. Prepare the data

4. Train the model

5. Load GPT-2 Weights (Optional)

🧪 Features

📊 Example Output

📚 References

🛠️ TODO

🧑‍💻 License

🤝 Acknowledgements

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages