Skip to contents

coverage and log2cpm are both over the annotated CDS

Usage

novoalignPipelineQC(
  meta_df,
  pipeline_output_dirpath,
  annote_obj_path,
  markers = c("NAT", "G418"),
  bam_suffix = "_sorted_aligned_reads_with_annote.bam",
  novolog_suffix = "_novoalign.log",
  exon_counts_suffix = "_read_count.tsv",
  cds_counts_suffix = "_read_count_cds.tsv",
  num_nodes = 10
)

Arguments

meta_df

metadata for the samples you'd like to QC. note that these must be included in the pipeline_output_dirpath

pipeline_output_dirpath

path to the directory which stores the subdirectories align, count and logs, eg /mnt/scratch/rnaseq_pipeline/pipeline_out/run_5500

annote_obj_path

path to an annotation file parsed by rtracklayer::import

markers

a list of markers. must be in the counts and genome annotations. default is c("NAT", "G418")

bam_suffix

suffix appended to the bam files. default is "_sorted_aligned_reads_with_annote.bam"

novolog_suffix

suffix appended to log files. default is "_novoalign.log"

exon_counts_suffix

suffix appended to exon count files. default is '_read_count.tsv'

cds_counts_suffix

suffix appended to cds count files. default is '_read_count.tsv'

num_nodes

number of cpus(by slurm definition)/threads(on your local). the argument in the parallel function is nnodes, hence the name of the argument. Default is 10

Value

a dataframe, long format, with columns fastqFileNumber, perturbed locus coverage/log2cpm, marker coverage/log2cpm and the library quality metrics