Visualizing Maximum Entropy Summary Trees Using R and d3.js

Kenny Shirley
Statistics Research Department
AT&T Labs, New York, NY

August 5, 2015
JSM, Seattle, WA


kshirley@research.att.com
github.com/kshirley
twitter.com/kennyshirley

Outline

  1. The problem: visualizing large node-weighted trees

  2. Our solution: Maximum entropy summary trees

  3. Going from “code that works for me” to an R package

  4. Miscellaneous thoughts



Motivating Example: DMOZ (the Open Directory Project)

Motivating Example: DMOZ (the Open Directory Project)

Some Summary Statistics

                                                                                                                              Topic Frequency
1                                    Top/Arts/Animation         6
2                   Top/Arts/Animation/Anime/Characters         6
3      Top/Arts/Animation/Anime/Clubs_and_Organizations        31
4                 Top/Arts/Animation/Anime/Collectibles        10
5            Top/Arts/Animation/Anime/Collectibles/Cels        12
...                                                 ...       ...
595001             Top/World/Uyghurche/Rayonluq/Yawropa         3
595002                     Top/World/Uyghurche/Référans         5
595003                   Top/World/Uyghurche/Salametlik         1
595004                        Top/World/Uyghurche/Sport         1
595005                        Top/World/Uyghurche/Xewer         5

Distribution of URLs aggregated to Level 2

Drilling down into Top/World…

Drilling down into Top/World…

Preview of Solution

Outline

  1. The problem: visualizing large node-weighted trees

  2. Our solution: Maximum entropy summary trees

  3. Going from “code that works for me” to an R package

  4. Miscellaneous thoughts

The defintion of a summary tree

The defintion of a summary tree

The defintion of a summary tree

The defintion of a summary tree

The defintion of a summary tree

Maximum entropy summary trees