Wiki-Age Networks

This project was conducted with excellent collabator and friend Thayer Alshaabi for a course final project in Principles of Complex Systems. See the full project write-up here

This project heavily features web scraping and basic NLP tools to build a dataset of information from wikipedia. Specifically, we scraped information related to notable people within fields, and and their age. We define at the highest level of interest, and most loosely, “Categories” as representative, typically, of academic disciplines. That being said, certain Categories lend themselves more readily towards our proposed analysis such as Mathematics, Chemistry and Physics in contrast towards more nebulous or highly specific topics such as History or Geodesy. Given a Category, we next define a ’Branch’ as any sub topic within that larger category, say X, such that it would appear reasonably within a list titled “Branches of X” or “Fields of study of X”, and importantly that it have an associated Wikipedia page. For each Category considered we sought to collect the following information from both its associated Wikipedia page and those of its Branches.

Example 1
Branches of physics over time, where a branch is assigned a year based on the median of each person found within that branches mean year.
Example 2
Network structure for a collected branch, where node size how many people are within that category.
Example 3
Example showing the Physics branch plotting information by year and number of people.