Skip to contents

get_metrics creates a data.frame that contains specified metrics for all network subsets and their permutations that have a subdirectory in the provided path. It is designed to be run on the output of the webgestalt_network function to prepare summary metrics for plotting with the plot_metrics function. The 'directory' path should contain only directories created by webgestalt_network.

Usage

get_metrics(
  directory,
  organism = "hsapiens",
  database = "geneontology_Biological_Process_noRedundant",
  gene_id = "ensembl_gene_id",
  get_sum = TRUE,
  get_percent = FALSE,
  get_mean = FALSE,
  get_median = FALSE,
  get_annotation_overlap = FALSE,
  get_size = TRUE,
  penalty = 3,
  fdr_threshold = 0.05,
  parallel = FALSE
)

Arguments

directory

a directory containing only the directories of ORA summaries created by webgestalt_network for all networks of interest

organism

a string specifying the organism that the data is from, e.g. "hsapiens" or "scerevisiae". Only required if get_annotation_overlap = TRUE.

database

the gene set database to search for enrichment - see options with WebGestaltR::listGeneSet(). Must be a Gene Ontology "biological process" database if get_annotation_overlap = TRUE.

gene_id

the naming system used for the input genes - see options with WebGestaltR::listIdType() and see webgestalt.org for examples of each type. Only required if get_annotation_overlap = TRUE.

get_sum

boolean whether to get the 'sum' metric, which is the sum of the negative log base 10 of the p-value for the top term of each source node minus 'penalty' times the total number of source nodes.

get_percent

boolean whether to get the 'percent' metric, which is the percent of source nodes with at least one term with a FDR below the 'fdr_threshold'

get_mean

boolean whether to get the 'mean' metric, which is the mean negative log base 10 of the p-value for the top term of each source node regardless of significance

get_median

boolean whether to get the 'median' metric, which is the median negative log base 10 of the p-value for the top term of each source node regardless of significance

get_annotation_overlap

boolean whether to get the 'annotation_overlap' metric, which is the percent of source nodes that are annotated to at least one of the 16 GO terms for which their target genes are most enriched

get_size

boolean whether to get the 'size' metric, which is the number of source nodes in the network subset that have more than one target gene with annotations. This number is used in the calculation of all other metrics.

penalty

the penalty applied to the 'sum' metric for each TF in the network

fdr_threshold

the FDR threshold for a gene set term to be considered significantly over-represented for the purposes of calculating the 'percent' metric

parallel

boolean whether to get the metrics for each network in the directory in parallel - use with caution, as this has not been adequately tested

Value

a list of data.frames, each containing the values of one metric. The columns of a data.frame represent the different subset sizes, and the rows represent the different network permutations. The first row is from the unpermuted networks.