码迷,mamicode.com
首页 > 其他好文 > 详细

6、RNA-Seq Analysis Pipeline

时间:2017-06-18 21:49:23      阅读:222      评论:0      收藏:0      [点我收藏+]

标签:view   tco   created   auth   mtools   any   stat   cal   for   

Created by Dhivya Arasappan, last modified by Dennis C Wylie on Nov 08, 2015

This pipeline uses an annotated genome to identify differential expressed genes/transcripts. 10 hour minimum ($470 internal, $600 external) per project.

1. Quality Assessment

Quality of data assessed by FastQC; results of quality assessment will be evaluated prior to downstream analysis.

  • Deliverables:
    • reports generated by FastQC
  • Tools used:
    • FastQC: (Andrews 2010) used to generate quality summaries of data:
      • Per base sequence quality report: useful for deciding if trimming necessary.
      • Sequence duplication levels: evaluation of library complexity. Higher levels of sequence duplication may be expected for high coverage RNAseq data.
      • Overrepresented sequences: evaluation of adapter contamination.

2. Fastq Preprocessing

Quality assessment used to decide if any preprocessing of the raw data is required and if so, preprocessing is performed.

  • Deliverables
    • Trimmed/filtered fastq files.
  • Tools Used:
    • Fastx-toolkit: Used to preprocess fastq files.
      • Fastq quality trimmer: Trimming reads based on quality.
      • Fastq quality filter: Filtering reads based on quality.
    • Cutadapt: Used to remove adaptor from reads.
 

3. Mapping

Mapping to genome reference performed using BWA-mem or Tophat.

  • Deliverables
    • Mapping results, as bam files and mapping statistics.
  • Tools Used:
    • BWA-mem: (Li 2013) primary aligner used to generate read alignments.
    • Tophat: (Kim 2011) aligner used to generate read alignments in a splice-aware manner and identify novel junctions.
    • Samtools: (Li 2009) used to generate mapping statistics.

4. Gene/Transcript Counting

Counting the number of reads mapping to annotated intervals to obtain abundance of genes/transcripts.

  • Deliverables
    • Raw gene/transcript counts
  • Tools Used:
    • HTSeq-count: (Anders 2014) used to count reads overlapping gene intervals.

5. DEG Identification

Normalization and statistical testing to identify differentially expressed genes.

  • Deliverables
    • DEG Summary and master file containing fold changes and p values for every gene, MA Plots.
  • Tools Used:
    • DESeq2: (Love 2014) used to perform normalization and test for differential expression using the negative binomial distribution.

6、RNA-Seq Analysis Pipeline

标签:view   tco   created   auth   mtools   any   stat   cal   for   

原文地址:http://www.cnblogs.com/renping/p/7045333.html

(0)
(0)
   
举报
评论 一句话评论(0
登录后才能评论!
© 2014 mamicode.com 版权所有  联系我们:gaon5@hotmail.com
迷上了代码!