Get summary metrics of a network's ORA results
get_metrics.Rd
get_metrics
creates a data.frame that contains specified metrics for all network subsets and their permutations
that have a subdirectory in the provided path. It is designed to be run on the output of the webgestalt_network
function to prepare summary metrics for plotting with the plot_metrics
function.
The 'directory' path should contain only directories created by webgestalt_network
.
Usage
get_metrics(
directory,
organism = "hsapiens",
database = "geneontology_Biological_Process_noRedundant",
gene_id = "ensembl_gene_id",
get_sum = TRUE,
get_percent = FALSE,
get_mean = FALSE,
get_median = FALSE,
get_annotation_overlap = FALSE,
get_size = TRUE,
penalty = 3,
fdr_threshold = 0.05,
parallel = FALSE
)
Arguments
- directory
a directory containing only the directories of ORA summaries created by
webgestalt_network
for all networks of interest- organism
a string specifying the organism that the data is from, e.g. "hsapiens" or "scerevisiae". Only required if get_annotation_overlap = TRUE.
- database
the gene set database to search for enrichment - see options with WebGestaltR::listGeneSet(). Must be a Gene Ontology "biological process" database if get_annotation_overlap = TRUE.
- gene_id
the naming system used for the input genes - see options with WebGestaltR::listIdType() and see webgestalt.org for examples of each type. Only required if get_annotation_overlap = TRUE.
- get_sum
boolean whether to get the 'sum' metric, which is the sum of the negative log base 10 of the p-value for the top term of each source node minus 'penalty' times the total number of source nodes.
- get_percent
boolean whether to get the 'percent' metric, which is the percent of source nodes with at least one term with a FDR below the 'fdr_threshold'
- get_mean
boolean whether to get the 'mean' metric, which is the mean negative log base 10 of the p-value for the top term of each source node regardless of significance
- get_median
boolean whether to get the 'median' metric, which is the median negative log base 10 of the p-value for the top term of each source node regardless of significance
- get_annotation_overlap
boolean whether to get the 'annotation_overlap' metric, which is the percent of source nodes that are annotated to at least one of the 16 GO terms for which their target genes are most enriched
- get_size
boolean whether to get the 'size' metric, which is the number of source nodes in the network subset that have more than one target gene with annotations. This number is used in the calculation of all other metrics.
- penalty
the penalty applied to the 'sum' metric for each TF in the network
- fdr_threshold
the FDR threshold for a gene set term to be considered significantly over-represented for the purposes of calculating the 'percent' metric
- parallel
boolean whether to get the metrics for each network in the directory in parallel - use with caution, as this has not been adequately tested