Request to validate my RNAseq workflow
0
0
Entering edit mode
8.8 years ago
umn_bist ▴ 390

The samples are tumor/match_normal PE RNA-seq samples from TCGA. I need to call/annotate somatic mutation

1. preprocessing QC: trim adapter using cutadapt (may use BBMap), remove low quality (<10) reads

2. align: TopHat2, add RG with bowtie2 (is this common practice?)

  • I am interested in using STAR instead of TopHat2 but my advisor warned that the STAR/Tophat does not share similar format for removing multimapped/unmapped reads (Tophat is part of the workflow right now)
for f in *_nodupe.bam
do fname=`basename $f .bam`

#Remove Create Unique and Multi/Unmapped Bams
    samtools view -h -F 1024 $f | awk 'substr($0,1,1)=="@"||match($0, /NH:i:1\t/) {print $0}' | samtools view -hSb - > ${fname}_unique.bam
    samtools view -F 1024 -h $f | awk 'substr($0,1,1)=="@"||!/NH:i:1\t/  {print $0}' | samtools view -hSb - > ${fname}_multimapped.bam

    #Unique Reads Aligning to hg19 Repeatitive Sequence

        samtools view ${fname}_unique.bam | wc -l > ${fname}.count;
        bamToBed -i ${fname}_unique.bam -bed12 |
        intersectBed -a packages/tracks/hg19/hg19_rmsk_FLL1.bed -b stdin -split -sorted -c > ${fname}_MappedRE.bed;
        awk -F '\t' '{print $4"\t"$7}' ${fname}_MappedRE.bed > ${fname}.names;
done

3. postprocessing QC: dedupe (mark duplicates)

4. variant calling/annotation: muTect2, snpSift (select against low MQ, AF, QUAL variants)

5. annotate: wAnnovar (may use snpEff)

RNA-Seq NGS • 1.6k views
ADD COMMENT

Login before adding your answer.

Traffic: 1544 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6