---
title: "Quick introduction to foodingraph"
author: "Victor Gasque, Cecilia Samieri, Boris Hejblum"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Quick introduction to foodingraph}
  %\VignetteEngine{knitr::rmarkdown}
  \usepackage[utf8]{inputenc}
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE, fig.width = 7, fig.height = 6, comment = "#>") 
```

```{r}
library(foodingraph)
```

## Introduction

A simple R package to infer food networks from categorical and binary variables.

Displays a weighted undirected graph from an adjacency matrix. Can perform confidence-interval bootstrap inference with mutual information or maximal information coefficient.

#### How it works

From an adjacency matrix, the package can infer the network with confidence-interval (CI) bootstraps of the distribution of mutual information[^1] values or maximal information coefficients (MIC)[^2]for each pairwise association.
The CI bootstrap calculated is compared to the CI bootstraps from simulated independent pairwise associations.
The CI bootstrap from simulated independent pairwise variables is used to define a threshold of non-significance in the network. Our approach is to use a threshold for each pairwise variable type : two ordinal variables, two binary variables, one ordinal variable and one ordinal variable.

For example, For each pairwise association, if the 99th percentile of the simulated CI is higher than the 1th percentile of the sample bootstrap distribution, the edge is removed.

From the inferred adjacency matrix, the package can then display the graph using `ggplot2`[^3], `igraph`[^4] and `ggraph`[^5].

See R documentation for more information.


## Example data set

For the purpose of this example, I invented some food intakes data on $n=13$ subjects and $f=8$ food groups : $o=6$ ordinally-encoded (from 0 to 13) and $b=2$ binary-encoded (0 or 1). Therefore, do not expect these examples to reflect reality.
```{r}

# Food intakes (ordinaly- or binary-encoded)
obs_data <- data.frame(
#| Foods | Subject 1   2   3   4   5   6   7   8   9  10  11  12  13  |
#|-------|------------------------------------------------------------|
  alcohol_cat  = c(8,  1,  3,  0, 10,  5,  1, 10,  2,  8,  1,  3,  9),
  bread_cat    = c(7,  4,  3,  4,  0,  9,  4,  5,  7,  3,  4,  0,  9),
  coffee_cat   = c(3,  6,  6,  6,  2,  3,  5,  8,  8,  6,  6,  2,  3),
  duck_cat     = c(0,  3,  1,  0,  0,  2, 13,  1,  0,  0,  2, 13,  1),
  eggs_cat     = c(5,  5,  4,  5,  8,  8,  6,  9,  6,  8,  2,  3,  1),
  fruit_cat    = c(1,  7,  5,  8,  2,  3,  1,  0,  7,  7,  5,  8,  2),
  gin_bin      = c(1,  0,  1,  0,  1,  0,  0,  1,  0,  0,  1,  0,  1),
  ham_bin      = c(1,  1,  1,  1,  1,  1,  1,  0,  1,  1,  1,  0,  1)
)

head(obs_data)

# The legend for the graph
legend <- data.frame(
  name   = colnames(obs_data),
  title  = c("Alcohol", "Bread",   "Coffee",    "Duck",    "Eggs", "Fruit", "Gin",     "Ham"),
  family = c("Alcohol", "Cereals", "Beverages", "Poultry", "Eggs", "Fruit", "Alcohol", "Meats")
)

# Transform family intro factors?

```

Now let's calculate the maximal information coefficient[^2] adjacency matrix, with the `foodingraph` function `mic_adj_matrix`.
```{r}
adjacency_matrix <-  mic_adj_matrix(obs_data)
```


## Network inference

This step is optional. If you want to visualize the network, jump to [Network visualization](#netviz)

### Arbitrary threshold
Foodingraph allows to select edges on the basis of a threshold value in the adjacency matrix.
It can either be applied to the adjacency matrix by the functions `graph_from_matrix()` or `links_nodes_from_mat()`, with two parameters:

1. `threshold` (default is 0) : the threshold value
2. `abs_threshold` (bool, default TRUE) : if the threshold should apply to the absolute values of the edges or not. If TRUE, it will *not* convert the values of the adjacency matrix to absolute values, only compare the threshold to the absolute values.

### Confidence-interval bootstrap inference
Foodingraph allows to perform confidence-interval (CI) bootstrap inference, by comparing the CI bootstrap of simulated independent data to the CI bootstrap of each pairwise association of the dataset.
Two methods to calculate the CI bootstrap exist : mutual information[^1] or maximal information coefficient[^6].

**NOTE** If you want to use mutual information, be sure to install the `minet` package available on Bioconductor. It will not be automatically downloaded when installing foodingraph.

#### CI bootstrap of independent simulated data
Let's start by simulating independent data.
As our dataset is comprised of ordinal and binary variables, we will simulate independent :

- pairwise ordinal variables
- pairwise binary variables
- pairwise ordinal & binary variables.

This will allow to compare each pairwise association of the dataset to the corresponding type of threshold.

For this example, we will use MIC.
```{r}
# Ordinal vs. ordinal
thresh_ord_ord <- boot_simulated_cat_bin("cat", method = "mic", size = 500)

# Binary vs. binary
thresh_bin_bin <- boot_simulated_cat_bin("bin", method = "mic", size = 500)

# Ordinal vs. binary
thresh_ord_bin <- boot_simulated_cat_bin("bincat", method = "mic", size = 500)
```

#### CI bootstrap inference

Now let's perform the CI bootstrap inference on the observed data.
To do this, foodingraph needs a list of the ordinal (a.k.a. categorical) and binary variables, so it can accurately compare the correct threshold to the correct pairwise variables.

As the computations can take some time, a progress bar is built into the function. You can deactivate it by setting the parameter `show_progress` to FALSE (function `boot_cat_bin`).
*Recommended if the output is in a Rmarkdown document.*
```{r}

cat_var <- c("alcohol_cat", "bread_cat", "coffee_cat", "duck_cat", "eggs_cat",
             "fruit_cat")
bin_var <- c("gin_bin", "ham_bin")

inferred_adj_matrix <- boot_cat_bin(obs_data,
                                    list_cat_var = cat_var,
                                    list_bin_var = bin_var,
                                    method = "mic",
                                    threshold_cat = thresh_ord_ord,
                                    threshold_bin = thresh_bin_bin,
                                    threshold_bin_cat = thresh_ord_bin,
                                    boots = 5000,
                                    show_progress = FALSE)

# Print how many edges have been removed
n_null_before <- (length(which(adjacency_matrix==0))-ncol(obs_data))/2
n_null_after <- (length(which(inferred_adj_matrix==0))-ncol(obs_data))/2
print(paste(n_null_after - n_null_before, "edges have been removed"))
```


## Network visualization {#netviz}

### Quick start: directly from the adjacency matrix
```{r}
graph1 <- graph_from_matrix(adjacency_matrix, legend, main_title = "My graph", layout = "graphopt")
graph1
```

### Or from a list of links and nodes
Useful to alter the links
```{r}
# Extract the links and nodes from the adjacency matrix
links_nodes <- links_nodes_from_mat(adjacency_matrix, legend)
  
# Transform negative weights into positive ones
links_nodes$links <- transform(links_nodes$links, weight = abs(weight))
   
# Display the graph
graph2 <- graph_from_links_nodes(links_nodes, main_title = "My graph")
graph2
```

### Save the graph in a file
```{r eval=F}
save_graph(graph1)
```


### Customization
Many options and layouts exist to customize the graph.

```{r message=F}
library(ggplot2)

custom1 <- graph_from_matrix(adjacency_matrix, legend, main_title = "Node label as name",
                             layout = "graphopt", node_label_title = F, node_label_size = 5)
custom2 <- graph_from_matrix(adjacency_matrix, legend, main_title = "Node type as label",
                             layout = "graphopt", node_type = "label")
custom3 <- graph_from_matrix(adjacency_matrix, legend, main_title = "Grid layout",
                             layout = "grid", node_label_size = 5)
custom4 <- graph_from_matrix(adjacency_matrix, legend, main_title = "Circle layout",
                             layout = "circle", node_label_size = 5)
```
```{r eval=F}
custom1$net
custom2$net
custom3$net
custom4$net
```

```{r echo=F}
# Cookbook for R, simplified here
multiplot <- function(..., cols=2) {
  library(grid)
  plots <- list(...)
  numPlots = length(plots)
  layout <- matrix(seq(1, cols * ceiling(numPlots/cols)),
                   ncol = cols, nrow = ceiling(numPlots/cols))
  grid.newpage()
  pushViewport(viewport(layout = grid.layout(nrow(layout), ncol(layout))))
  for (i in 1:numPlots) {
    matchidx <- as.data.frame(which(layout == i, arr.ind = TRUE))
    
    print(plots[[i]], vp = viewport(layout.pos.row = matchidx$row,
                                    layout.pos.col = matchidx$col))
  }
}

multiplot(custom1$net + theme(legend.position="none"),
          custom2$net + theme(legend.position="none"),
          custom3$net + theme(legend.position="none"),
          custom4$net + theme(legend.position="none"))
```

### Compare graphs

Foodingraph provides a useful graph comparison function, which harmonizes the graphs' weights and node degree sizes, in order to facilitate the visual comparison.

First, let's generate a second graph.

```{r}

# New set of observation data
obs_data_2 <- matrix(c(round(runif(78, 0, 13)), round(runif(26))), nrow = 13, ncol = 8)
colnames(obs_data_2) <- colnames(obs_data)

# Compute the MIC adjacency matrix
adjacency_matrix_2 <- mic_adj_matrix(obs_data_2)

graph2 <- graph_from_matrix(adjacency_matrix_2, legend, main_title = "My graph 2",
                            layout = "graphopt")
```

Then let's compare the first graph and this one on a single, unified plot using `compare_graphs()`.
```{r, fig.width = 7, fig.height=5}
comp1_2 <- compare_graphs(graph1, graph2, position = "horizontal")
comp1_2
```

You can also save this new graph. It will automatically have a bigger size.
```{r eval=F}
save_graph(comp1_2)
```


## References

[^1]: Meyer, Patrick E, Frédéric Lafitte, and Gianluca Bontempi. “Minet: A R/Bioconductor Package for Inferring Large Transcriptional Networks Using Mutual Information.” BMC Bioinformatics 9, no. 1 (December 2008). https://doi.org/10.1186/1471-2105-9-461.

[^2]: Albanese, Davide, Michele Filosi, Roberto Visintainer, Samantha Riccadonna, Giuseppe Jurman, and Cesare Furlanello. “Minerva and Minepy: A C Engine for the MINE Suite and Its R, Python and MATLAB Wrappers.” Bioinformatics 29, no. 3 (February 1, 2013): 407–8. https://doi.org/10.1093/bioinformatics/bts707.

[^3]: H. Wickham. ggplot2: Elegant Graphics for Data Analysis. Springer-Verlag New York, 2016.

[^4]: Csardi G, Nepusz T: The igraph software package for complex network research, InterJournal, Complex Systems 1695. 2006. http://igraph.org

[^5]: Thomas Lin Pedersen, https://ggraph.data-imaginist.com/

[^6]: Reshef, D. N., Y. A. Reshef, H. K. Finucane, S. R. Grossman, G. McVean, P. J. Turnbaugh, E. S. Lander, M. Mitzenmacher, and P. C. Sabeti. “Detecting Novel Associations in Large Data Sets.” Science 334, no. 6062 (December 16, 2011): 1518–24. https://doi.org/10.1126/science.1205438.