Skip to contents

Given one network, evaluate will first make subsets with subset_network, then run Over-Representation Analysis (ORA) with webgestalt_network, followed by get_metrics and plot_metrics to plot selected summary statistics.

Usage

evaluate(
  network,
  reference_set,
  output_directory,
  network_name,
  organism = "hsapiens",
  database = "geneontology_Biological_Process_noRedundant",
  gene_id = "ensembl_gene_id",
  edges = c(512, 1024, 2048, 4096, 8192, 16384, 32768, 65536),
  num_possible_TFs = 0,
  permutations = 3,
  penalty = 3,
  fdr_threshold = 0.05,
  get_sum = TRUE,
  get_percent = FALSE,
  get_mean = FALSE,
  get_median = FALSE,
  get_annotation_overlap = FALSE,
  get_size = TRUE,
  plot = TRUE
)

Arguments

network

path to the file containing the network to create subsets from. The file should be tab-separated with three columns: source node, target node, edge score

reference_set

path to the set of all genes possibly included in the network. Must be a .txt file containing exactly one column of the genes that could possibly appear in the network.

output_directory

path to the directory in which to store the generated network subsets, ORA summaries, and plots

network_name

short name for the network - used in file naming so may not contain spaces

organism

a string specifying the organism that the data is from, e.g. "hsapiens" or "scerevisiae" - see options with WebGestaltR::listOrganism()

database

the gene set database to search for enrichment - see options with WebGestaltR::listGeneSet(). Must be a Gene Ontology "biological process" database if get_annotation_overlap = TRUE.

gene_id

the naming system used for the input genes - see options with WebGestaltR::listIdType() and see webgestalt.org for examples of each type

edges

list of total numbers of edges or average edges per TF to include in each subset

num_possible_TFs

if set to a number > 0, the elements of 'edges' will first be multiplied by this number to get the number of edges for each subset

permutations

the number of randomly permuted networks to create and run ORA on

penalty

the penalty applied to the 'sum' metric for each TF in the network

fdr_threshold

the FDR threshold for a gene set term to be considered significantly over-represented for the purposes of calculating the 'percent' metric

get_sum

boolean whether to get and plot the 'sum' metric, which is the sum of the negative log base 10 of the p-value for the top term of each source node minus 'penalty' times the total number of source nodes.

get_percent

boolean whether to get and plot the 'percent' metric, which is the percent of source nodes with at least one term with a FDR below the 'fdr_threshold'

get_mean

boolean whether to get and plot the 'mean' metric, which is the mean negative log base 10 of the p-value for the top term of each source node regardless of significance

get_median

boolean whether to get and plot the 'median' metric, which is the median negative log base 10 of the p-value for the top term of each source node regardless of significance

get_annotation_overlap

boolean whether to get and plot the 'annotation_overlap' metric, which is the percent of source nodes that are annotated to at least one of the 16 GO terms for which their target genes are most enriched

get_size

boolean whether to get and plot the 'size' metric, which is the number of source nodes in the network subset that have more than one target gene with annotations. This number is used in the calculation of all other metrics.

plot

boolean whether to make plots of the calculated metrics and write them to a pdf

Value

output of get_metrics. Can be used as input to plot_metrics.

Details

The input file should be tab-separated with two or three columns: source node (e.g. transcription factor), target node (e.g. the regulated gene), and, optionally, edge score.