Title: | Food Network Inference and Visualization |
---|---|
Description: | Displays a weighted undirected food graph from an adjacency matrix. Can perform confidence-interval bootstrap inference with mutual information or maximal information coefficient. Based on my Master 1 internship at the Bordeaux Population Health center. References : Reshef et al. (2011) <doi:10.1126/science.1205438>, Meyer et al. (2008) <doi:10.1186/1471-2105-9-461>, Liu et al. (2016) <doi:10.1371/journal.pone.0158247>. |
Authors: | Victor Gasque [cre, aut], Boris Hejblum [aut], Cecilia Samieri [aut] |
Maintainer: | Victor Gasque <[email protected]> |
License: | GPL-3 |
Version: | 0.1.0 |
Built: | 2025-02-07 04:58:59 UTC |
Source: | https://github.com/vgasque/foodingraph |
For a given dataset, performs a confidence-interval bootstrap of the mutual information or maximal information coefficient (MIC) for each pairwise association.
Computes the MI or MIC for each pairwise association.
Performs a bootstrap (of boots
samples), and store
each pairwise association
Calculate the 1th percentile for each pairwise association from the bootstrap distribution
If the percentile is inferior to the threshold of the corresponding pairwise variable type, then the MI or MIC is set to 0.
boot_cat_bin(obs_data, list_cat_var, list_bin_var, threshold_bin, threshold_cat, threshold_bin_cat, method = c("mi", "mic"), boots = 5000, show_progress = TRUE)
boot_cat_bin(obs_data, list_cat_var, list_bin_var, threshold_bin, threshold_cat, threshold_bin_cat, method = c("mi", "mic"), boots = 5000, show_progress = TRUE)
obs_data |
(data.frame or matrix) : a dataset which rows are observations and columns the variables. |
list_cat_var |
: list of the categorical variables of the dataset |
list_bin_var |
: list of the binary variables of the dataset |
threshold_bin |
: the threshold to apply to binary pairwise associations |
threshold_cat |
: the threshold to apply to categorical pairwise associations |
threshold_bin_cat |
: to apply to a pairwise association between a binary and a categorical variable |
method |
: the method to use to compute the adjacency matrix
("mi" or "mic").
If "mi", uses mutual information package |
boots |
: number of bootstraps (default 5000) |
show_progress |
: if TRUE, prints the percentage of completion to keep track of the algorithm's progress. Default is TRUE. Recommended to FALSE for RMarkdown files. |
The inferred adjacency matrix. All bootstrap 1th percentile values of each pairwise association inferior to their predefined thresholds will be set to 0.
Create a defined number of simulated independent random variables of
a given size
according to type
: 2 ordinal variables,
2 binary variables, 1 binary and 1 ordinal variable.
A number of bootstraps are then performed on the sample to calculate
a confidence interval of the bootstrap distribution of the chosen method:
mutual information or the maximal information coefficient.
The percentile method is used to calculate this interval.
boot_simulated_cat_bin(type = c("cat", "bin", "bincat"), method = c("mic", "mi"), simu = 10, boots = 5000, size = 500, percentile = 0.99)
boot_simulated_cat_bin(type = c("cat", "bin", "bincat"), method = c("mic", "mi"), simu = 10, boots = 5000, size = 500, percentile = 0.99)
type |
: the type of the simulated variables: |
method |
: the method used to calculate the association : mutual
information ( |
simu |
: the number of simulated pairs of variables. For each pair, the confidence-interval bootstrap is calculated from the bootstrap distribution of the MI/MIC of between the two pairs. At the end of the program, the mean of the chosen percentile is given. Default is 10. |
boots |
: the number of bootstraps per simulation. Default is 5000. |
size |
: the size of the sample. Default is 500. |
percentile |
: the percentile kept. Default is 0.99 (the 99th percentile). |
The mean of the percentile values.
Reshef et al. (2011) <doi:10.1126/science.1205438>
Meyer et al. (2008) <doi:10.1186/1471-2105-9-461>
boot_simulated_cat_bin("cat", "mic", 2, 500)
boot_simulated_cat_bin("cat", "mic", 2, 500)
From two graphs generated by graph_from_matrix
or
graph_from_links_nodes
, displays two graphs
with the same legend (edge weights and size and node degrees)
to facilitate the visual comparison of the two graphs.
NB : if you use node families, make sure they have the same families
in the two graphs (this can be done by generating a same palette for
both graphs using family_palette
)
compare_graphs(graph1, graph2, titles = NULL, position = c("vertical", "horizontal"), n_nodes = 5, n_weights = 5, edge_width_range = c(0.2, 2), edge_alpha_range = c(0.4, 1), node_size_range = c(1, 10), unique_legend = TRUE)
compare_graphs(graph1, graph2, titles = NULL, position = c("vertical", "horizontal"), n_nodes = 5, n_weights = 5, edge_width_range = c(0.2, 2), edge_alpha_range = c(0.4, 1), node_size_range = c(1, 10), unique_legend = TRUE)
graph1 |
: the first graph |
graph2 |
: the second graph |
titles |
(optional) : list of 2 : the two title of the graphs.
Default are the graph titles from |
position |
: should the graphs be displayed vertically (use
|
n_nodes |
: the number of nodes to be displayed in the legend. R will do its best to be around this number. |
n_weights |
: the number of weights to be displayed in the legend. R will do its best to be around this number. |
edge_width_range |
: range of the edges width (default is 0.2 to 2). |
edge_alpha_range |
: if |
node_size_range |
: range of the node sizes. (default is 1 to 10) |
unique_legend |
: should there be a unique legend (default is TRUE) BE CAREFUL to have the same family colors if you use families/ |
From a list of food families, create a color for each family.
family_palette(family)
family_palette(family)
family |
(list) : can be either the family column from the legend table, or just a list of the families. In all cases, the parameter will be converted as a factor and sorted (alphabetically or numerically) Only its unique values are necessary. |
Very useful when comparing graphs with the same families.
It can be used by itself, but this function was created to be the
family_palette
argument when calling
display_graph_from_links_nodes()
The colors will be
automatically added to the graph (nodes and legend)
A list of key and values. - keys are the family names - values are the color
family_palette(c("Fruits", "Vegetables", "Meats"))
family_palette(c("Fruits", "Vegetables", "Meats"))
The foodingraph package provide two categories of functions :
confidence-interval (CI) bootstrap inference of mutual information (MI) or maximal information coefficient (MIC) adjacency matrices.
The two functions are
boot_cat_bin
: a function to perform the CI bootstrap
inference for pairwise associations between ordinal and binary variables.
It uses thresholds defined by simulation of independent associations using
boot_simulated_cat_bin
, such that it simulates independent
associations between ordinal-ordinal, binary-binary and ordinal-binary pairs
of variables.
It calculates the CI bootstraps for each pairwise association of the variables'
dataset, then compares the 1st percentile of these CI to the corresponding
thresholds of independent data.
boot_simulated_cat_bin
: a function to determine the threshold
values of MI or MIC of independent pairs
of variables (ordinal vs. ordinal, and binary vs binary and ordinal vs. binary).
It calculates the CI bootstraps of MI or MIC for these pairs of variables,
and return a defined percentile of these CI (e.g. 99th percentile).
The three main functions are
graph_from_matrix
: create a graph from an adjacency matrix.
This function need at least two arguments : 1. the adjacency matrix, in
which the column names and row names are the node names. 2. the legend,
which is a data frame of at least two columns : name
(the name of the nodes
in the adjacency matrix, e.g. CRUDSAL_cat) and title
(the titles for each
name, e.g. raw vegetables)
Optionally, you can add a column family
to specify the nodes' families.
graph_from_links_nodes
: create a graph from a list of nodes
and links. This function needs two arguments : 1. the list of nodes
and links, which should be the result from links_nodes_from_mat
(if not, make sure the structure corresponds). 2. the legend
(described above).
compare_graphs
: a function to compare two graphs.
It unifies the legends and attributes, so the graphs can be visually
comparable.
save_graph
: a function to save the graph in a file at high
resolution.
Other functions include
family_palette
: to create a color palette to be used in the
graph. It is usually done automatically, but can prove useful if comparing multiple
graphs, to ensure the family colors remain the same throughout the graphs.
links_nodes_from_mat
: to extract the links and nodes from an
adjacency matrix
mic_adj_matrix
: using the cstats
function from
the minerva package, calculate the adjacency MIC matrix.
Given a list of links and nodes (e.g. from extract_links_nodes func)
Uses igraph and ggraph to display the network plots
Must have the proper structure OR use extract_links_nodes()
,
which automatically returns this structure when given an adjacency
matrix and its legend (see documentation for this function)
network_data should be a list of 2 : edges, nodes
For edges (data.frame) : from, to, weight, width, sign (of the weight: neg/pos)
For nodes (data.frame) : name, title, family, family_color (optional)
graph_from_links_nodes(network_data, main_title = "", node_type = c("point", "label"), node_label_title = TRUE, family_palette = NULL, layout = "nicely", remove_null = TRUE, edge_alpha = TRUE, edge_color = c("#6DBDE6", "#FF8C69"), edge_width_range = c(0.2, 2), edge_alpha_range = c(0.4, 1), node_label_size = 3, legend_label_size = 10, ...)
graph_from_links_nodes(network_data, main_title = "", node_type = c("point", "label"), node_label_title = TRUE, family_palette = NULL, layout = "nicely", remove_null = TRUE, edge_alpha = TRUE, edge_color = c("#6DBDE6", "#FF8C69"), edge_width_range = c(0.2, 2), edge_alpha_range = c(0.4, 1), node_label_size = 3, legend_label_size = 10, ...)
network_data |
(list of two) : links, nodes with the proper structure |
main_title |
(string, optional) : the title of the network |
node_type |
: |
node_label_title |
(bool, default F) : should the node labels be the names or title column? (e.g. names : CRUDSAL_cat, title : Raw vegetables) |
family_palette |
(list of key = value) : the keys are the family codes
(from family column in the legend), and the values are the corresponding
colors. Can be generated using the |
layout |
(chr) : the layout to be used to construct the graph |
remove_null |
(bool) : should the nodes with 0 connections (degree 0) be removed from the graph. default is TRUE. |
edge_alpha |
(bool) : should the edges have a transparent scale? In addition to the width scale. |
edge_color |
(list) : list of 2. The first element is the color of the
negative edges, the second the positive. Default is |
edge_width_range |
: range of the edges width. (default is 0.2 to 2) |
edge_alpha_range |
: if |
node_label_size |
: the size of the node labels. Default is 3. |
legend_label_size |
: the size of the legend labels. Default is 10. |
... |
: other parameters to pass to ggraph |
a list of 3 : igraph
: the igraph object, net
the graph,
deg
the degree table.
Csardi et al. (2006) <https://igraph.org>
Perdersen (2019) <https://ggraph.data-imaginist.com>
adj_matrix <- cor(iris[,-5]) legend <- data.frame(name = colnames(iris[,-5]), title = colnames(iris[,-5])) graph_iris <- links_nodes_from_mat(adj_matrix, legend) graph_from_links_nodes(graph_iris, main_title = "Iris graph")
adj_matrix <- cor(iris[,-5]) legend <- data.frame(name = colnames(iris[,-5]), title = colnames(iris[,-5])) graph_iris <- links_nodes_from_mat(adj_matrix, legend) graph_from_links_nodes(graph_iris, main_title = "Iris graph")
Given an adjacency matrix and a legend, displays the graph.
This is a shortcut function, rather than using links_nodes_from_mat()
and graph_from_links_nodes()
.
graph_from_matrix(adjacency_matrix, legend, threshold = 0, abs_threshold = TRUE, filter_nodes = TRUE, main_title = "", node_type = c("point", "label"), node_label_title = TRUE, family_palette = NULL, layout = "nicely", remove_null = TRUE, edge_alpha = TRUE, edge_color = c("#6DBDE6", "#FF8C69"), edge_width_range = c(0.2, 2), edge_alpha_range = c(0.4, 1), node_label_size = 3, legend_label_size = 10, ...)
graph_from_matrix(adjacency_matrix, legend, threshold = 0, abs_threshold = TRUE, filter_nodes = TRUE, main_title = "", node_type = c("point", "label"), node_label_title = TRUE, family_palette = NULL, layout = "nicely", remove_null = TRUE, edge_alpha = TRUE, edge_color = c("#6DBDE6", "#FF8C69"), edge_width_range = c(0.2, 2), edge_alpha_range = c(0.4, 1), node_label_size = 3, legend_label_size = 10, ...)
adjacency_matrix |
: a matrix of size n x n, each element being a number explaining the relationship (coefficient, information) between two variables given in the column and row names /!\ As this code is to draw undirected graphs, only the lower triangular part of association matrix is used to extract the information |
legend |
: a data frame of columns in order : 1) name, str : name of the node in the adjacency matrix, e.g. CRUDSAL_cat 2) title, str : name of the node, e.g. Raw vegetables 3) family, factor : (optional) the family the node belongs to, e.g. Vegetables |
threshold |
numeric) : a number defining the minimal threshold. If the weights are less than this threshold, they will be set to 0. |
abs_threshold |
(bool) : should the threshold keep negative values,
e.g. if |
filter_nodes |
(bool) : should the variables not in the adjacency
matrix be displayed on the graph? Default is TRUE
CAREFUL : if set to |
main_title |
(string, optional) : the title of the network |
node_type |
: |
node_label_title |
(bool, default F) : should the node labels be the names or title column? (e.g. names : CRUDSAL_cat, title : Raw vegetables) |
family_palette |
(list of key = value) : the keys are the family codes
(from family column in the legend), and the values are the corresponding
colors. Can be generated using |
layout |
(chr) : the layout to be used to construct the graph |
remove_null |
(bool) : should the nodes with 0 connections (degree 0) be removed from the graph. Default is TRUE. |
edge_alpha |
(bool) : should the edges have a transparent scale? In addition to the width scale. |
edge_color |
(list) : list of 2. The first element is the color of the
negative edges, the second the
positive. Default is |
edge_width_range |
: range of the edges width. (default is 0.2 to 2) |
edge_alpha_range |
: if |
node_label_size |
: the size of the node labels. Default is 3. |
legend_label_size |
: the size of the legend labels. Default is 10. |
... |
: other parameters to pass to ggraph 'create_layout' |
a list of 3 : igraph
: the igraph object, net
the graph,
deg
the degree table.
Csardi et al. (2006) <https://igraph.org>
Perdersen (2019) <https://ggraph.data-imaginist.com>
adj_matrix <- cor(iris[,-5]) legend <- data.frame(name = colnames(iris[,-5]), title = colnames(iris[,-5])) graph_from_matrix(adj_matrix, legend, main_title = "Iris graph")
adj_matrix <- cor(iris[,-5]) legend <- data.frame(name = colnames(iris[,-5]), title = colnames(iris[,-5])) graph_from_matrix(adj_matrix, legend, main_title = "Iris graph")
From an adjacency matrix, extracts two data.frames/tibbles
Links. columns : from, to, with, weight
Nodes. columns : name, title. name corresponds to the names used in 'from' and 'to'
links_nodes_from_mat(adjacency_matrix, legend, threshold = 0, abs_threshold = TRUE, filter_nodes = TRUE)
links_nodes_from_mat(adjacency_matrix, legend, threshold = 0, abs_threshold = TRUE, filter_nodes = TRUE)
adjacency_matrix |
: a matrix of size n x n, each element being a number explaining the relationship e.g. coefficient, information between two variables given in the column and row names /!\ As this code is to draw undirected graphs, only the lower triangular part of adjacency matrix is used to extract the information. |
legend |
: a data frame of columns in order : 1) name, str : name of the node in the adjacency matrix, e.g. CRUDSAL_cat 2) title, str : name of the node, e.g. Raw vegetables 3) family, factor : (optional) the family the node belongs to, e.g. Vegetables |
threshold |
numeric) : a number defining the minimal threshold. If the weights are less than this threshold, they will be set to 0. |
abs_threshold |
(bool) : should the threshold keep negative values,
e.g. if |
filter_nodes |
(bool) : should the variables not in the adjacency
matrix be displayed on the graph? Default is TRUE
CAREFUL : if set to |
A list of two data frames : links and nodes.
adj_matrix <- cor(iris[,-5]) legend <- data.frame(name = colnames(iris[,-5]), title = colnames(iris[,-5])) links_nodes_from_mat(adj_matrix, legend)
adj_matrix <- cor(iris[,-5]) legend <- data.frame(name = colnames(iris[,-5]), title = colnames(iris[,-5])) links_nodes_from_mat(adj_matrix, legend)
For a given dataset, computes the adjacency matrix
of maximal information coefficient (MIC) of each
pairwise association.
NOTE : another approach could have been to give the whole
data frame to the minerva
package func cstats()
,
but it seemed slower in my tests.
mic_adj_matrix(obs_data)
mic_adj_matrix(obs_data)
obs_data |
(data.frame or matrix) : a dataset which rows are observations and columns the variables. |
the adjacency matrix of MIC values for each pairwise association.
Reshef et al. (2011) <doi:10.1126/science.1205438>
mic_adj_matrix(iris[,-5])
mic_adj_matrix(iris[,-5])
Save the graph generated from graph_from_matrix
or graph_from_links_nodes
or
compare_graphs
.
save_graph(graph, filename = "foodingraph_%03d.png", width = NULL, height = NULL, dpi = 300, ...)
save_graph(graph, filename = "foodingraph_%03d.png", width = NULL, height = NULL, dpi = 300, ...)
graph |
: the graph |
filename |
(optional) : the name of the file and format. Default is "foodingraph_*.png". |
width |
(optional) : width of the image in cm. Default is 25 cm for a single graph or a comparison in a vertical position. For a comparison in an horizontal position, 40cm. |
height |
(optional) : height of the image in cm. Default is 20 cm for a single graph, 25cm for a comparison in an horizontal position. For a comparison in a vertical position, 40cm. |
dpi |
(optional) : the resolution of the image in dpi. Default is 300 |
... |
: other parameters to pass to the |
adj_matrix <- cor(iris[,-5]) legend <- data.frame(name = colnames(iris[,-5]), title = colnames(iris[,-5])) graph_iris <- graph_from_matrix(adj_matrix, legend, main_title = "Iris graph") # Save to a in a temporary file location save_graph(graph_iris, tempfile(fileext = ".png"))
adj_matrix <- cor(iris[,-5]) legend <- data.frame(name = colnames(iris[,-5]), title = colnames(iris[,-5])) graph_iris <- graph_from_matrix(adj_matrix, legend, main_title = "Iris graph") # Save to a in a temporary file location save_graph(graph_iris, tempfile(fileext = ".png"))