Dear BioStars Community,
I have a problem with an alignment to the transcriptome.
I have 8 RNA-Seq libraries sequenced on Illumina HiScanSQ system in one lane (2x100bp, paired-end) per sequencing run. These 8 libraries (1 pool) were put into two sequencing runs to obtain a decent number of reads. After demultiplexing (using bcl2fastq-1.8.4) the reads were trimmed using TrimGalore and aligned to the previously assembled transcriptome (because there is no reference genome for the organism - Pinus sylvestris - I am trying to analyze...) by Bowtie2-2.2.6. In the case of 7 libraries there were almost no difference in the alignment efficiency (~85-95%, with ~60-85% of uniquely mapped reads), but in case of one library something strange happened:
Run1:
11109859 reads; of these: 11109859 (100.00%) were paired; of these: 1701658 (15.32%) aligned concordantly 0 times 6666961 (60.01%) aligned concordantly exactly 1 time 2741240 (24.67%) aligned concordantly >1 times ---- 1701658 pairs aligned concordantly 0 times; of these: 11078 (0.65%) aligned discordantly 1 time ---- 1690580 pairs aligned 0 times concordantly or discordantly; of these: 3381160 mates make up the pairs; of these: 3218192 (95.18%) aligned 0 times 86430 (2.56%) aligned exactly 1 time 76538 (2.26%) aligned >1 times 85.52% overall alignment rate
Run2:
14719563 reads; of these: 14719563 (100.00%) were paired; of these: 7641835 (51.92%) aligned concordantly 0 times 4991995 (33.91%) aligned concordantly exactly 1 time 2085733 (14.17%) aligned concordantly >1 times ---- 7641835 pairs aligned concordantly 0 times; of these: 7874 (0.10%) aligned discordantly 1 time ---- 7633961 pairs aligned 0 times concordantly or discordantly; of these: 15267922 mates make up the pairs; of these: 15039673 (98.51%) aligned 0 times 94443 (0.62%) aligned exactly 1 time 133806 (0.88%) aligned >1 times 48.91% overall alignment rate
So my question is: what should I do to find out what went wrong? I excluded (maybe too soon...) the human error because this was the same pool used for two runs (from one Eppendorf tube).
I also did FastQC on demultiplexed and trimmed reads - links for this library with low alignment efficiency are provided below:
Run1 (the good one), demultiplexed: http://twrzes.wtvk.pl/run1_R1_fastqc.html and http://twrzes.wtvk.pl/run1_R2_fastqc.html
Run1, after trimming: http://twrzes.wtvk.pl/run1_R1_trimmed_fastqc.html and http://twrzes.wtvk.pl/run1_R2_trimmed_fastqc.html
Run2 (the bad one), demultiplexed: http://twrzes.wtvk.pl/run2_R1_fastqc.html and http://twrzes.wtvk.pl/run2_R2_fastqc.html
Run2, after trimming: http://twrzes.wtvk.pl/run2_R1_trimmed_fastqc.html and http://twrzes.wtvk.pl/run2_R2_trimmed_fastqc.html
Command-line commands I used for:
1) Demultiplexing:
/path/to/configureBclToFastq.pl --input-dir /path/to/folder/with/BCLs/Data/Intensities/BaseCalls --output-dir /path/to/folder/with/BCLs/Unaligned --sample-sheet /path/to/folder/with/BCLs/sample-sheet.csv --fastq-cluster-count 0 --mismatches 1 --with-failed-reads
2) Trimming (TrimGalore-0.4.0, a wrapper for cutadapt-1.8.3):
trimgalore --paired --quality 20 --illumina --stringency 1 -e 0.2 --length 40 -o /path/to/trimmed/fastq --trim1 run1_R1.fastq run1_R2.fastq
3) Alignment (Bowtie2-2.2.6)
bowtie2 -p 12 -I 0 -X 2000 --dovetail --very-sensitive-local -N 1 -x /path/to/index/index -1 run1_R1_trimmed.fastq -2 run1_R2_trimmed.fastq -S /path/to/aligned/sam/run1.sam
If you need any additional info, I would be more than happy to provide it.
Thank you very much for your efforts on solving this problem.
Kind regards,
Tomasz Wrzesinski
--
Tomasz Wrzesinski, MSc
PhD Student
Laboratory of High Throughput Technologies
Institute of Molecular Biology and Biotechnology
Faculty of Biology
Adam Mickiewicz University in Poznan
Umultowska 89/1.117
61-614 Poznan, Poland
tel. +48 61 829 5833
e-mail: twrzes@amu.edu.pl
You mixed your links: "run1, after trimming" actually points to raw run2, and "run2, demultiplexed" points to run1 after trimming.
I am very sorry for that, I edited my post so now everything should be OK.
Thank you for pointing out my mistake.
Kind regards,
Tomasz Wrzesinski
Did you run FastQC before and after read cleaning of both runs?
Yes, I did, links are provided in my post (below Bowtie2 reports).
Kind regards,
Tomasz Wrzesinski