Skip to content

A data analysis project exploring the Olympic Games from 1896 to 2016. It includes data cleaning, visualization, and insights on athletes, medals, and countries.

Notifications You must be signed in to change notification settings

VIPULbunny/olympics-analysis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 

Repository files navigation

🏅 Olympics Data Analysis (1896-2016)

A professional banner image for 'Olympics Data Analysis'  The design should feature the Olympic rings, a dynamic stadium background, and a futuristic

📌 Project Overview

The Olympic Games have been the pinnacle of international sports since 1896. This project explores the historical dataset of the Olympics, uncovering trends, athlete performance, and country-wise participation. Through data cleaning, visualization, and analysis, we gain insights into how the Games have evolved over time.

📂 Dataset Description

The size of athlete_events.csv is more than 20mb so i had provided the link and description in the file name 'Dataset_link.py'

This analysis uses two primary datasets:

  1. athlete_events.csv - Contains detailed records of Olympic athletes, including:

    • Name, Age, Gender
    • Sport, Event, Medal (if won)
    • Country (NOC), Year, Season (Summer/Winter)
  2. noc_regions.csv - Maps National Olympic Committees (NOCs) to country names, helping in regional analysis.

🚀 Project Objectives

  • Perform Exploratory Data Analysis (EDA) to understand athlete trends.
  • Visualize country-wise medal counts and athlete participation.
  • Analyze gender representation and its evolution in the Olympics.
  • Identify the most successful athletes and countries over the years.

🔧 Installation

To run this project on your local m achine:

  1. Clone the Repository:
    git clone https://github.com/VIPULbunny/olympics-analysis.git
  2. Navigate to the Project Directory:
    cd olympics-analysis
  3. Install Dependencies:
    pip install numpy pandas matplotlib seaborn
  4. Run the Jupyter Notebook:
    jupyter notebook

📊 Exploratory Data Analysis (EDA)

🏆 Key Insights from Data

  • Total Athletes Participated: {total_athletes}
  • Total Olympic Games Editions: {total_games}
  • Top 10 Countries by Athlete Count:
    • USA, Germany, UK, France, China, etc.
  • Most Successful Athletes:
    • Michael Phelps, Usain Bolt, etc.
  • Gender Representation Over Time:
    • Increasing female participation in modern Olympics.

🔎 Data Cleaning & Preprocessing

  • Merged athlete_events.csv with noc_regions.csv for accurate country mapping.
  • Handled missing values in age, medal, and region data.
  • Converted categorical features (Sex, Medal) into structured formats for better analysis.

📈 Data Visualizations

🎖️ Medal Distribution

image

🏅 Top 10 Countries by Medal Count

image

📊 Gender Representation Over the Years

image

🌍 Count occurrences of 'Sex' for each 'Season'

image

📜 License

This project is licensed under the MIT License.

🤝 Contributing

We welcome contributions! If you’d like to improve the analysis or add new insights:

  1. Fork the repository.
  2. Create a feature branch: git checkout -b feature-branch
  3. Commit your changes: git commit -m "Added new analysis"
  4. Push to GitHub: git push origin feature-branch
  5. Open a Pull Request 🚀

📬 Contact

For queries or collaborations, reach out via email or open an issue on GitHub.


⭐ If you find this project useful, please give it a star! 🌟

About

A data analysis project exploring the Olympic Games from 1896 to 2016. It includes data cleaning, visualization, and insights on athletes, medals, and countries.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published