QC a crypto run output by the novoalign+htseq pipeline — novoalignPipelineQC • brentlabRnaSeqTools

coverage and log2cpm are both over the annotated CDS

Usage

novoalignPipelineQC(
  meta_df,
  pipeline_output_dirpath,
  annote_obj_path,
  markers = c("NAT", "G418"),
  bam_suffix = "_sorted_aligned_reads_with_annote.bam",
  novolog_suffix = "_novoalign.log",
  exon_counts_suffix = "_read_count.tsv",
  cds_counts_suffix = "_read_count_cds.tsv",
  num_nodes = 10
)

Arguments

meta_df: metadata for the samples you'd like to QC. note that these must be included in the pipeline_output_dirpath
pipeline_output_dirpath: path to the directory which stores the subdirectories align, count and logs, eg /mnt/scratch/rnaseq_pipeline/pipeline_out/run_5500
annote_obj_path: path to an annotation file parsed by rtracklayer::import
markers: a list of markers. must be in the counts and genome annotations. default is c("NAT", "G418")
bam_suffix: suffix appended to the bam files. default is "_sorted_aligned_reads_with_annote.bam"
novolog_suffix: suffix appended to log files. default is "_novoalign.log"
exon_counts_suffix: suffix appended to exon count files. default is '_read_count.tsv'
cds_counts_suffix: suffix appended to cds count files. default is '_read_count.tsv'
num_nodes: number of cpus(by slurm definition)/threads(on your local). the argument in the parallel function is nnodes, hence the name of the argument. Default is 10

Value

a dataframe, long format, with columns fastqFileNumber, perturbed locus coverage/log2cpm, marker coverage/log2cpm and the library quality metrics