This repository contains an implementation of a GPT-style language model from scratch, following the book "Build a Large Language Model (From Scratch)" by Sebastian Raschka. It includes training code, utilities, and a notebook to load GPT-2 weights from OpenAI.

- Goal: Train a GPT-style transformer model from scratch on a real-world dataset.
- Data: The Verdict by Alice Meynell (public domain).
- Framework: PyTorch
- Tokenizer: GPT-2 BPE tokenizer via
tiktoken - Model: GPT-124M architecture defined from first principles
- Training: Simple training loop with loss tracking and validation
- Extras: Jupyter notebook to load and compare against GPT-2 weights from OpenAI
.
├── GPT/ # GPT model definition and config
│ ├── GPT.py
│ └── GPT_CONFIG.py
├── data/ # Training data
│ └── the-verdict.txt
├── data_loader/ # Data preprocessing and batching
│ └── data_loader.py
├── loss/ # Custom loss functions for GPT training
│ └── loss.py
├── networks/ # Transformer block components (e.g., LayerNorm, attention)
│ └── networks.py
├── tools/ # Training loop and related utilities
│ └── program.py
├── utils/ # Utility functions
│ ├── utils.py
│ └── plot.py
├──gpt2.ipynb # Jupyter notebook
├── train.py # Main training script
├── README.md
└── requirements.txt
git clone https://github.com/dimitri009/GPT.git
cd GPTI recommend using a virtual environment:
pip install -r requirements.txt
This project uses torch, tiktoken, matplotlib, and numpy.
The dataset is already included:
data/the-verdict.txt
If you want to download a fresh copy:
wget https://en.wikisource.org/wiki/The_Verdict -O data/the-verdict.txt
python train.py
Training will log loss over epochs, plot the curves at the end and save the plot in fig/. You can customize hyperparameters in train.py and GPT/GPT_CONFIG.py.
To load pretrained OpenAI GPT-2 weights and compare against your model, use the notebook:
jupyter notebook gpt2.ipynb
Manual implementation of GPT-2 style attention layers
Positional embeddings
Token embeddings using tiktoken
Train-validation splitting and loss plotting
Option to use drop_last, stride, and context length control in dataloader
GPT-2 weight loading notebook for comparison
Training loss and validation loss are plotted after training. Example:

Book: Build a Large Language Model (From Scratch) by Sebastian Raschka
Tokenizer: OpenAI tiktoken
Dataset: The Verdict - Wikisource
- Add script for classifier fine-tuning (e.g., sentiment classification, topic classification)
- Add script for instruction fine-tuning (e.g., Q&A or task completion using prompt–response format)
This project is released under the MIT License. See LICENSE for details.
- Sebastian Raschka for his detailed book and educational resources
- OpenAI for GPT-2 and the
tiktokentokenizer - Project Gutenberg / Wikisource for public domain texts