Pipeline for >100 RNA-Seq PE samples
0
0
Entering edit mode
8.8 years ago
umn_bist ▴ 390

Although I am familiar with RNA-seq workflow for n<20, this is my first time handling a large set of RNA-seq data. These are tumor (and matched normals) RNA-seq.

Are there any automated pipelines that are commonly used in the field for pre-alignment QC, alignment, post-alignment QC for a large set of RNA-seq data?

Two areas that I am having difficulty automating are:

  • [cutadapt] Providing an adapter list for both forward and reverse PE strands. I have a single list but I do not know if cutadapt will automatically reverse the adapter sequences. Also determining a value for -overlap=LENGTH. I may use BBMap in place of cutadapt.

    cutadapt -q 10,10 -a "${adapter}" -A "${adapter}" -o "${file1%_1.fastq}_1_trimmed.fastq" -p "${file2%_2.fastq}_2_trimmed.fastq" "${file1}" "${file2}"
    
  • [TopHat2] Providing --mate-inner-dist and --mate-std-dev - as these will vary from sample to sample.

    tophat -p 10 --mate-inner-dist {} --mate-std-dev {} --no-coverage-search --output-dir "${file}" --transcriptome-index
    
RNA-Seq • 2.0k views
ADD COMMENT
0
Entering edit mode

Also using cutadapt I am trimming bases of quality score <10. Is this acceptable if I'm going to filter variants that are <30 MQ and <20 QUAL using snpSift?

ADD REPLY

Login before adding your answer.

Traffic: 1544 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6