Experimental Setup, Model Results and Subgraphs

Wikipedia, as a community based encyclopedia, has been growing a lot through the years since its creation, as people create more and more articles. In this section we want to investigate the impact of the size of the Wikipedia graph on players behavior, and on the possibility that the game offers. In a bigger graph, the number of options at each step grows, and new shorter paths are possible. At the same time, one can get lost in the universe of possibilities and never find the target article.

To understand better the impact the size of the graph, we constructed subgraphs of the 2007 Wikispeedia graph, based on different criteria, and created a model able to imitate the human’s behavior on the Wikispeedia game.

Experimental Setup and Model Results

We trained a deep learning model to imitate human behavior in the game Wikispeedia. The model predicts which link a player would click next based on the current article, the target article, the player’s navigation history, and contextual information about each candidate link, such as the anchor sentence, link position, and the first sentence of the destination article. Textual information is encoded using BERT embeddings, and the model uses a PageRank architecture with a context encoder and a candidate encoder.

Model Evaluation

To measure how closely the model replicates human behavior, we used several metrics:

Accuracy of choices: Percentage of situations where the model selects the exact link chosen by a player.
Top-k human agreement: For certain simplified situations defined by a (current article, target article) pair, multiple players may encounter the same situation but choose different links. This allows us to identify the most popular choices among players. We calculate the percentage of times the model selects a link that is among the top-k most popular links chosen by humans. We also compute the same metric for each player compared to other players, providing a measure of agreement between humans. Results show:
- Human agreement:
  - Top-1: 0.54 (95% CI: [0.5356, 0.5456]),
  - Top-3: 0.70 (95% CI: [0.6916, 0.7007]),
  - Top-5: 0.74 (95% CI: [0.7363, 0.7450]).
- Model agreement:
  - Top-1: 0.62,
  - Top-3: 0.74,
  - Top-5: 0.76.
  This shows that the model often chooses links that are popular among human players, sometimes even more consistently than individual players themselves.
Path completion: We simulate complete navigation paths starting from a source article to a target article. To mimic human behavior, the model has a probability to give up at each step, similar to real players. The model’s average winrate is 45.54%, compared to 56.92% for human players.
Path length: Finished paths produced by the model have an average length of 7.92 steps, compared to 6.76 steps for human players, showing realistic navigation patterns.

Graph Experiments

We explored how different constructions of the Wikipedia graph influence the navigation paths generated by the model. All experiments use the same pretrained model and the same path generation pipeline, differing only in the structure of the underlying graph.

Global graph: A complete representation of the Wikispeedia network, where each edge corresponds to a unique (source article, target article) pair. When multiple hyperlinks exist between the same articles, only the earliest link (in document order) is kept, ensuring consistency with the model training data.
Reduced-size subgraphs: Subgraphs containing 75%, 50%, and 25% of the original articles. Articles are sampled using a category-stratified strategy so that the overall distribution of article categories remains comparable across graph sizes.
Subgraphs with limited outgoing links: A graph where only the first 7 hyperlinks of each article are preserved. Despite the restriction on outgoing links, the overall distribution of article categories remains very similar to that of the complete graph, ensuring that no major category is disproportionately underrepresented.
Section-restricted subgraphs: Subgraphs where only hyperlinks appearing in the lead section of articles are kept. Even with this section restriction, the distribution of article categories is largely preserved, closely matching the distribution in the full graph.

For the global graph and each subgraph, navigation paths are generated by simulating a player moving from a starting article toward a target article using the pretrained model. At each step, the model scores all available outgoing candidate links and chooses one based on its predictions. The simulation stops if the target article is reached, if no valid candidates remain, if the player gives up according to empirical human give-up statistics, or if the maximum number of steps is exceeded. Candidate link representations, including textual embeddings and additional features, are precomputed for each article to ensure consistency with the model training.

Hint

We want to look at the model’s output.