A simple R package to infer food networks from categorical and binary variables.
Displays a weighted undirected graph from an adjacency matrix. Can perform confidence-interval bootstrap inference with mutual information or maximal information coefficient.
From an adjacency matrix, the package can infer the network with confidence-interval (CI) bootstraps of the distribution of mutual information1 values or maximal information coefficients (MIC)2for each pairwise association. The CI bootstrap calculated is compared to the CI bootstraps from simulated independent pairwise associations. The CI bootstrap from simulated independent pairwise variables is used to define a threshold of non-significance in the network. Our approach is to use a threshold for each pairwise variable type : two ordinal variables, two binary variables, one ordinal variable and one ordinal variable.
For example, For each pairwise association, if the 99th percentile of the simulated CI is higher than the 1th percentile of the sample bootstrap distribution, the edge is removed.
From the inferred adjacency matrix, the package can then display the
graph using ggplot2
3, igraph
4 and
ggraph
5.
See R documentation for more information.
For the purpose of this example, I invented some food intakes data on n = 13 subjects and f = 8 food groups : o = 6 ordinally-encoded (from 0 to 13) and b = 2 binary-encoded (0 or 1). Therefore, do not expect these examples to reflect reality.
# Food intakes (ordinaly- or binary-encoded)
obs_data <- data.frame(
#| Foods | Subject 1 2 3 4 5 6 7 8 9 10 11 12 13 |
#|-------|------------------------------------------------------------|
alcohol_cat = c(8, 1, 3, 0, 10, 5, 1, 10, 2, 8, 1, 3, 9),
bread_cat = c(7, 4, 3, 4, 0, 9, 4, 5, 7, 3, 4, 0, 9),
coffee_cat = c(3, 6, 6, 6, 2, 3, 5, 8, 8, 6, 6, 2, 3),
duck_cat = c(0, 3, 1, 0, 0, 2, 13, 1, 0, 0, 2, 13, 1),
eggs_cat = c(5, 5, 4, 5, 8, 8, 6, 9, 6, 8, 2, 3, 1),
fruit_cat = c(1, 7, 5, 8, 2, 3, 1, 0, 7, 7, 5, 8, 2),
gin_bin = c(1, 0, 1, 0, 1, 0, 0, 1, 0, 0, 1, 0, 1),
ham_bin = c(1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 0, 1)
)
head(obs_data)
#> alcohol_cat bread_cat coffee_cat duck_cat eggs_cat fruit_cat gin_bin ham_bin
#> 1 8 7 3 0 5 1 1 1
#> 2 1 4 6 3 5 7 0 1
#> 3 3 3 6 1 4 5 1 1
#> 4 0 4 6 0 5 8 0 1
#> 5 10 0 2 0 8 2 1 1
#> 6 5 9 3 2 8 3 0 1
# The legend for the graph
legend <- data.frame(
name = colnames(obs_data),
title = c("Alcohol", "Bread", "Coffee", "Duck", "Eggs", "Fruit", "Gin", "Ham"),
family = c("Alcohol", "Cereals", "Beverages", "Poultry", "Eggs", "Fruit", "Alcohol", "Meats")
)
# Transform family intro factors?
Now let’s calculate the maximal information coefficient6 adjacency matrix, with
the foodingraph
function mic_adj_matrix
.
This step is optional. If you want to visualize the network, jump to Network visualization
Foodingraph allows to select edges on the basis of a threshold value
in the adjacency matrix. It can either be applied to the adjacency
matrix by the functions graph_from_matrix()
or
links_nodes_from_mat()
, with two parameters:
threshold
(default is 0) : the threshold valueabs_threshold
(bool, default TRUE) : if the threshold
should apply to the absolute values of the edges or not. If TRUE, it
will not convert the values of the adjacency matrix to absolute
values, only compare the threshold to the absolute values.Foodingraph allows to perform confidence-interval (CI) bootstrap inference, by comparing the CI bootstrap of simulated independent data to the CI bootstrap of each pairwise association of the dataset. Two methods to calculate the CI bootstrap exist : mutual information7 or maximal information coefficient8.
NOTE If you want to use mutual information, be sure
to install the minet
package available on Bioconductor. It
will not be automatically downloaded when installing foodingraph.
Let’s start by simulating independent data. As our dataset is comprised of ordinal and binary variables, we will simulate independent :
This will allow to compare each pairwise association of the dataset to the corresponding type of threshold.
For this example, we will use MIC.
#> Confidence-interval bootstrap on simulated independent variables of type: cat
#> Number of simulations: 10
#> Number of bootstraps per simulations: 5000
#> Sample size for each simulation: 500
#> Contigency table of the simulated data:
#> 9161471730066462894213020125400185610007401010412000000120000100000
#> Simulation 1 : 0.0325299899129651
#> Simulation 2 : 0.0323083797191614
#> Simulation 3 : 0.032432277027209
#> Simulation 4 : 0.0319265107370138
#> Simulation 5 : 0.0323957860888818
#> Simulation 6 : 0.0327634852271982
#> Simulation 7 : 0.0326330065387264
#> Simulation 8 : 0.0329643699517739
#> Simulation 9 : 0.0315401581257434
#> Simulation 10 : 0.0318476115790957
#> Mean of the percentiles: 0.0323341574907769
#> Confidence-interval bootstrap on simulated independent variables of type: bin
#> Number of simulations: 10
#> Number of bootstraps per simulations: 5000
#> Sample size for each simulation: 500
#> Contigency table of the simulated data:
#> 2051925026
#> Simulation 1 : 0.00972426379492631
#> Simulation 2 : 0.00989299752571164
#> Simulation 3 : 0.00937051852871288
#> Simulation 4 : 0.00972871782283215
#> Simulation 5 : 0.00986344038036663
#> Simulation 6 : 0.0102193594612705
#> Simulation 7 : 0.00964083533948696
#> Simulation 8 : 0.00964160371333008
#> Simulation 9 : 0.0103418659240403
#> Simulation 10 : 0.00981311454186768
#> Mean of the percentiles: 0.00982367170325451
#> Confidence-interval bootstrap on simulated independent variables of type: bincat
#> Number of simulations: 10
#> Number of bootstraps per simulations: 5000
#> Sample size for each simulation: 500
#> Contigency table of the simulated data:
#> 171512420019114671241071
#> Simulation 1 : 0.0239665945183287
#> Simulation 2 : 0.0223714028126321
#> Simulation 3 : 0.0237178455590487
#> Simulation 4 : 0.0228545853060847
#> Simulation 5 : 0.0222746800652629
#> Simulation 6 : 0.0227178692065114
#> Simulation 7 : 0.022871385647938
#> Simulation 8 : 0.023434137980914
#> Simulation 9 : 0.0221290536412375
#> Simulation 10 : 0.023291194508422
#> Mean of the percentiles: 0.022962874924638
Now let’s perform the CI bootstrap inference on the observed data. To do this, foodingraph needs a list of the ordinal (a.k.a. categorical) and binary variables, so it can accurately compare the correct threshold to the correct pairwise variables.
As the computations can take some time, a progress bar is built into
the function. You can deactivate it by setting the parameter
show_progress
to FALSE (function
boot_cat_bin
). Recommended if the output is in a
Rmarkdown document.
cat_var <- c("alcohol_cat", "bread_cat", "coffee_cat", "duck_cat", "eggs_cat",
"fruit_cat")
bin_var <- c("gin_bin", "ham_bin")
inferred_adj_matrix <- boot_cat_bin(obs_data,
list_cat_var = cat_var,
list_bin_var = bin_var,
method = "mic",
threshold_cat = thresh_ord_ord,
threshold_bin = thresh_bin_bin,
threshold_bin_cat = thresh_ord_bin,
boots = 5000,
show_progress = FALSE)
#> Performing boostrap inference with method : mic
# Print how many edges have been removed
n_null_before <- (length(which(adjacency_matrix==0))-ncol(obs_data))/2
n_null_after <- (length(which(inferred_adj_matrix==0))-ncol(obs_data))/2
print(paste(n_null_after - n_null_before, "edges have been removed"))
#> [1] "9 edges have been removed"
graph1 <- graph_from_matrix(adjacency_matrix, legend, main_title = "My graph", layout = "graphopt")
graph1
#> $igraph
#> IGRAPH 8f5cf50 UNW- 8 36 --
#> + attr: name (v/c), title (v/c), family (v/c), size (v/n), weight
#> | (e/n), width (e/n), sign (e/c), alpha (e/n)
#> + edges from 8f5cf50 (vertex names):
#> [1] alcohol_cat--alcohol_cat alcohol_cat--bread_cat alcohol_cat--coffee_cat
#> [4] alcohol_cat--duck_cat alcohol_cat--eggs_cat alcohol_cat--fruit_cat
#> [7] alcohol_cat--gin_bin alcohol_cat--ham_bin bread_cat --bread_cat
#> [10] bread_cat --coffee_cat bread_cat --duck_cat bread_cat --eggs_cat
#> [13] bread_cat --fruit_cat bread_cat --gin_bin bread_cat --ham_bin
#> [16] coffee_cat --coffee_cat coffee_cat --duck_cat coffee_cat --eggs_cat
#> [19] coffee_cat --fruit_cat coffee_cat --gin_bin coffee_cat --ham_bin
#> + ... omitted several edges
#>
#> $net
#>
#> $deg
#> name degrees
#> 1 Alcohol 9
#> 2 Bread 9
#> 3 Coffee 9
#> 4 Duck 9
#> 5 Eggs 9
#> 6 Fruit 9
#> 7 Gin 9
#> 8 Ham 9
#>
#> attr(,"class")
#> [1] "list" "foodingraph"
Useful to alter the links
# Extract the links and nodes from the adjacency matrix
links_nodes <- links_nodes_from_mat(adjacency_matrix, legend)
# Transform negative weights into positive ones
links_nodes$links <- transform(links_nodes$links, weight = abs(weight))
# Display the graph
graph2 <- graph_from_links_nodes(links_nodes, main_title = "My graph")
graph2
#> $igraph
#> IGRAPH ab54937 UNW- 8 36 --
#> + attr: name (v/c), title (v/c), family (v/c), size (v/n), weight
#> | (e/n), width (e/n), sign (e/c), alpha (e/n)
#> + edges from ab54937 (vertex names):
#> [1] alcohol_cat--alcohol_cat alcohol_cat--bread_cat alcohol_cat--coffee_cat
#> [4] alcohol_cat--duck_cat alcohol_cat--eggs_cat alcohol_cat--fruit_cat
#> [7] alcohol_cat--gin_bin alcohol_cat--ham_bin bread_cat --bread_cat
#> [10] bread_cat --coffee_cat bread_cat --duck_cat bread_cat --eggs_cat
#> [13] bread_cat --fruit_cat bread_cat --gin_bin bread_cat --ham_bin
#> [16] coffee_cat --coffee_cat coffee_cat --duck_cat coffee_cat --eggs_cat
#> [19] coffee_cat --fruit_cat coffee_cat --gin_bin coffee_cat --ham_bin
#> + ... omitted several edges
#>
#> $net
#>
#> $deg
#> name degrees
#> 1 Alcohol 9
#> 2 Bread 9
#> 3 Coffee 9
#> 4 Duck 9
#> 5 Eggs 9
#> 6 Fruit 9
#> 7 Gin 9
#> 8 Ham 9
#>
#> attr(,"class")
#> [1] "list" "foodingraph"
Many options and layouts exist to customize the graph.
library(ggplot2)
custom1 <- graph_from_matrix(adjacency_matrix, legend, main_title = "Node label as name",
layout = "graphopt", node_label_title = F, node_label_size = 5)
custom2 <- graph_from_matrix(adjacency_matrix, legend, main_title = "Node type as label",
layout = "graphopt", node_type = "label")
custom3 <- graph_from_matrix(adjacency_matrix, legend, main_title = "Grid layout",
layout = "grid", node_label_size = 5)
custom4 <- graph_from_matrix(adjacency_matrix, legend, main_title = "Circle layout",
layout = "circle", node_label_size = 5)
Foodingraph provides a useful graph comparison function, which harmonizes the graphs’ weights and node degree sizes, in order to facilitate the visual comparison.
First, let’s generate a second graph.
# New set of observation data
obs_data_2 <- matrix(c(round(runif(78, 0, 13)), round(runif(26))), nrow = 13, ncol = 8)
colnames(obs_data_2) <- colnames(obs_data)
# Compute the MIC adjacency matrix
adjacency_matrix_2 <- mic_adj_matrix(obs_data_2)
graph2 <- graph_from_matrix(adjacency_matrix_2, legend, main_title = "My graph 2",
layout = "graphopt")
Then let’s compare the first graph and this one on a single, unified
plot using compare_graphs()
.
#> Warning in get_plot_component(plot, "guide-box"): Multiple components found;
#> returning the first one. To return all, use `return_all = TRUE`.
You can also save this new graph. It will automatically have a bigger size.
Meyer, Patrick E, Frédéric Lafitte, and Gianluca Bontempi. “Minet: A R/Bioconductor Package for Inferring Large Transcriptional Networks Using Mutual Information.” BMC Bioinformatics 9, no. 1 (December 2008). https://doi.org/10.1186/1471-2105-9-461.↩︎
Albanese, Davide, Michele Filosi, Roberto Visintainer, Samantha Riccadonna, Giuseppe Jurman, and Cesare Furlanello. “Minerva and Minepy: A C Engine for the MINE Suite and Its R, Python and MATLAB Wrappers.” Bioinformatics 29, no. 3 (February 1, 2013): 407–8. https://doi.org/10.1093/bioinformatics/bts707.↩︎
H. Wickham. ggplot2: Elegant Graphics for Data Analysis. Springer-Verlag New York, 2016.↩︎
Csardi G, Nepusz T: The igraph software package for complex network research, InterJournal, Complex Systems 1695. 2006. http://igraph.org↩︎
Thomas Lin Pedersen, https://ggraph.data-imaginist.com/↩︎
Albanese, Davide, Michele Filosi, Roberto Visintainer, Samantha Riccadonna, Giuseppe Jurman, and Cesare Furlanello. “Minerva and Minepy: A C Engine for the MINE Suite and Its R, Python and MATLAB Wrappers.” Bioinformatics 29, no. 3 (February 1, 2013): 407–8. https://doi.org/10.1093/bioinformatics/bts707.↩︎
Meyer, Patrick E, Frédéric Lafitte, and Gianluca Bontempi. “Minet: A R/Bioconductor Package for Inferring Large Transcriptional Networks Using Mutual Information.” BMC Bioinformatics 9, no. 1 (December 2008). https://doi.org/10.1186/1471-2105-9-461.↩︎
Reshef, D. N., Y. A. Reshef, H. K. Finucane, S. R. Grossman, G. McVean, P. J. Turnbaugh, E. S. Lander, M. Mitzenmacher, and P. C. Sabeti. “Detecting Novel Associations in Large Data Sets.” Science 334, no. 6062 (December 16, 2011): 1518–24. https://doi.org/10.1126/science.1205438.↩︎