This may surprise you, but there isn’t an easy, “canonical” method to construct simple probability trees in R. Google uncovers some hacky attempts from years past, but it obviously hasn’t been a pressing issue or priority in the community. The reason for this, I think, is threefold: (1) probability trees are boring, STATS101 material; (2) until recently, there haven’t been great tools to render nodes and trees; and (3) designating a sensible, comprehensive input is somewhat tricky. What do you name the parameters? Should the function(s) take a table or data frame? If so, in what shape?
To expand on the first point, R and the R community are great at introducing programming techniques to people who know statistics, but less great at introducting statistics to people who know how to program. The purpose of this post is to demonstrate how a simple statistical procedure–Bayes’ Theorum–might be calculated and displayed in a simple visual format in R.
As I alluded to above, there are now several great tools to render networks, trees, and other such hierarchies. After considering
data.tree
and igraph
, I decided the Diagrammer
package best suited my needs. The documentation can be found here.
Note: these trees required the development version of DiagrammeR
, so you may need to devtools::install_github("rich-iannone/DiagrammeR")
for the code to work.
My bayes_probability_tree
function is below. The assumed context is a medical test: What is the probability I have X, given
that I test positive?
The function renders the tree and returns it invisibly for further edits. For example, let’s set our prior at 0.07, the sensitivity of the test (true positive) at 0.95, and the specificity of the test (true negative) at 0.98.
> bayes_probability_tree(prior = 0.07, true_positive = 0.95, true_negative = 0.98)
The probability of having (prior) after testing positive is 0.7814
The message produced follows Bayes Theorum: the probability of A, given B, is the probability of A and B divided by the probability of B. Thus, there is a 78% chance that someone has X, given that they tested positive.
I’m not sure how or if these simple statistical tasks could be expanded into a package, but I think it’s worth considering.