Although I am familiar with RNA-seq workflow for n<20, this is my first time handling a large set of RNA-seq data. These are tumor (and matched normals) RNA-seq.
Are there any automated pipelines that are commonly used in the field for pre-alignment QC, alignment, post-alignment QC for a large set of RNA-seq data?
Two areas that I am having difficulty automating are:
[cutadapt] Providing an adapter list for both forward and reverse PE strands. I have a single list but I do not know if cutadapt will automatically reverse the adapter sequences. Also determining a value for
-overlap=LENGTH
. I may use BBMap in place of cutadapt.cutadapt -q 10,10 -a "${adapter}" -A "${adapter}" -o "${file1%_1.fastq}_1_trimmed.fastq" -p "${file2%_2.fastq}_2_trimmed.fastq" "${file1}" "${file2}"
[TopHat2] Providing
--mate-inner-dist
and--mate-std-dev
- as these will vary from sample to sample.tophat -p 10 --mate-inner-dist {} --mate-std-dev {} --no-coverage-search --output-dir "${file}" --transcriptome-index
Also using cutadapt I am trimming bases of quality score <10. Is this acceptable if I'm going to filter variants that are <30 MQ and <20 QUAL using snpSift?