This project investigates European royal family networks through graph analytics, Wikipedia API data extraction, and Natural Language Processing (NLP).
It models monarchic relationships as a knowledge graph to study correlations between inbreeding coefficients, social connectivity, and biographical sentiment analysis derived from publicly available historical data.
The analysis combines network science, computational genealogy, and text mining to uncover patterns in dynastic interrelations and hereditary impact over time.
- Implemented web scraping using the Wikipedia RESTful API to extract structured data from monarch biography infoboxes.
- Parsed relationship attributes (e.g.,
parents,spouses,predecessors,successors) using JSON query pipelines. - Applied data validation, entity resolution, and error handling to ensure robust data ingestion.
Keywords: Wikipedia API, JSON parsing, RESTful endpoints, data pipeline automation, entity disambiguation.
- Built a multi-relational social graph with NetworkX, modeling royal individuals as nodes and family or political connections as edges.
- Edge attributes encoded relationship types:
blood,marriage, andsuccession. - Node attributes included temporal metadata (
birth/death,reign start/end) and computed genetic coefficients. - Computed structural graph metrics:
- Degree centrality
- Betweenness centrality
- Graph connectivity and component analysis
- Community modularity
Keywords: graph analytics, knowledge graph, NetworkX, adjacency matrix, social network analysis (SNA), community detection, degree distribution, connected components.
- Implemented a recursive computation of the inbreeding coefficient for each individual node: f(X) = Ξ£ [ 0.5^(n - 1) * (1 + f(A)) ]
where:
-
nβ number of individuals in the ancestral loop between the parents and common ancestor -
f(A)β inbreeding coefficient of the common ancestor -
Mapped coefficient distributions across dynasties and time periods to identify genealogical bottlenecks.
-
Compared inbreeding scores across centuries (1450β1700 vs. 1920+) to study temporal trends in hereditary practices.
Keywords: graph-based inheritance modeling, genealogical inference, recursive algorithms, computational biology, historical data analytics.
- Extracted biographical text from Wikipedia pages using NLP preprocessing techniques (tokenization, stopword removal, lemmatization).
- Performed sentiment analysis with the NLTK SentimentIntensityAnalyzer to evaluate emotional tone and correlate it with health or leadership attributes in monarch biographies.
- Identified statistical shifts in sentiment distributions between high- and low-inbreeding groups.
Keywords: sentiment analysis, text mining, NLTK, feature extraction, computational linguistics, historical text analytics, data-driven storytelling.
- Created interactive visualizations using Plotly and Matplotlib for:
- Graph connectivity patterns
- Inbreeding distribution histograms
- Temporal evolution of coefficients
- Sentiment polarity comparisons
- Generated data-driven insights supported by reproducible Jupyter notebooks.
Keywords: data visualization, exploratory data analysis (EDA), Plotly, Matplotlib, network diagrams, Jupyter reproducibility.
| Category | Tools & Libraries |
|---|---|
| Data Extraction | Wikipedia API, Requests, JSON |
| Graph Analysis | NetworkX, Pandas, NumPy |
| NLP | NLTK, Regular Expressions |
| Visualization | Matplotlib, Plotly |
| Development | Python 3.10+, Google Colab, Jupyter Notebook |
| Version Control | Git, GitHub |
- Constructed a royal family knowledge graph with 7,365 nodes and 13,259 edges.
- Identified Habsburg and Bourbon dynasties as exhibiting the highest inbreeding coefficients.
- Discovered that modern royal lineages (post-1920) retain genetic correlation patterns despite sociocultural shifts.
- Observed sentiment polarity variations correlated with high genetic similarity scores, offering preliminary evidence of potential cognitive or reputational bias.
This project bridges computational social science, historical data mining, and applied graph learning, demonstrating skills highly relevant to roles in:
- AI-driven Knowledge Graphs
- Social Network Analysis
- Data Mining and API Engineering
- NLP and Text Analytics
- Data Science for Research and Policy Analytics