Visualizing Maximum Entropy Summary Trees Using R and d3.js

Kenny Shirley
Statistics Research Department
AT&T Labs, New York, NY

August 5, 2015
JSM, Seattle, WA


  1. The problem: visualizing large node-weighted trees

  2. Our solution: Maximum entropy summary trees

  3. Going from “code that works for me” to an R package

  4. Miscellaneous thoughts

Motivating Example: DMOZ (the Open Directory Project)

Motivating Example: DMOZ (the Open Directory Project)

Some Summary Statistics

                                                                                                                              Topic Frequency
1                                    Top/Arts/Animation         6
2                   Top/Arts/Animation/Anime/Characters         6
3      Top/Arts/Animation/Anime/Clubs_and_Organizations        31
4                 Top/Arts/Animation/Anime/Collectibles        10
5            Top/Arts/Animation/Anime/Collectibles/Cels        12
...                                                 ...       ...
595001             Top/World/Uyghurche/Rayonluq/Yawropa         3
595002                     Top/World/Uyghurche/Référans         5
595003                   Top/World/Uyghurche/Salametlik         1
595004                        Top/World/Uyghurche/Sport         1
595005                        Top/World/Uyghurche/Xewer         5

Distribution of URLs aggregated to Level 2

Drilling down into Top/World…

Drilling down into Top/World…

Preview of Solution


  1. The problem: visualizing large node-weighted trees

  2. Our solution: Maximum entropy summary trees

  3. Going from “code that works for me” to an R package

  4. Miscellaneous thoughts

The defintion of a summary tree

The defintion of a summary tree

The defintion of a summary tree

The defintion of a summary tree

The defintion of a summary tree

Maximum entropy summary trees



The end of the project… or is it?

The end of the project… or is it?

The end of the project… or is it?


  1. The problem: visualizing large node-weighted trees

  2. Our solution: Maximum Entropy Summary Trees

  3. Going from “code that works for me” to an R package

  4. Miscellaneous thoughts

How to make this an R package

How to make this an R package

How to make this an R package

How to make this an R package

Using the summarytrees package:

  1. Read in your data as a “list of edges”:
    • 4 variables: node ID, parent ID, (non-negative) weight, and label.
    • To-do: accept nested JSON-formatted trees, other formats?
  2. Do the computation (with K = 100 for example):
    • optimal(..., K = 100, epsilon = 0) for the exact algorithm
    • optimal(..., K = 100, epsilon > 0) for the approximation algorithm
    • greedy(..., K = 100) for the greedy algorithm
    • All of them return the list of K summary trees as output
  3. Call prepare.vis() to set plotting options, such as node colors, the sizes of various plotting elements, etc.

  4. Call draw.vis() to open a browser and locally serve the visualization from a temporary directory on your machine using the servr package.

The package has vignettes, and some of this will change over time, most likely.


Comparison to Collapsible Tree


  1. The problem: visualizing large node-weighted trees

  2. Our solution: Maximum Entropy Summary Trees

  3. Going from “code that works for me” to an R package

  4. Miscellaneous thoughts

The Good

The Bad

The Ugly



Acknowledgements: Thanks to Carson Sievert and Carlos Scheidegger for tips and discussion on d3.js

R summarytrees package at Github: kshirley/summarytrees