Entering edit mode
7.5 years ago
ThePresident
▴
180
I am dealing with Illumina paired-end, 150 bp read-length datasets.
Here's what my fastq files looklike:
@SRR3405394.1 1 length=151
NCGACTGAGGTAATTACAGTTCTTCGGTTCCAGCCAGCTGTCTCAGTTTATGGACCAGAACAACCCGCTGTCTGAGATTACGCACAAACGTCGTATCTCCGCACTCGGCCCAGGCGGTCTGACCCGTGAACGTGCAGGGTACAGATCGGAA
+SRR3405394.1 1 length=151
#<GGGGAGGIGGGIGGIIIIIGIIIIIIIIIIGIIIIIIIGGIIGIGGIIIIIGIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIGIGGIGGIIGGIIIIIGGGIIIIIGGIIIIGIIIGIGII<GGGGAGIGGAGG
I use this command to align with bwa:
bwa aln -t8 index 1.fastq > 1.sai
bwa aln -t8 index 2.fastq > 2.sai
bwa sampe index 1.sai 2.sai 1.fastq 2.fastq > aln_bwa.sam
When I look at my SAM file, I see this:
@SQ SN:selected LN:4641665
@PG ID:bwa PN:bwa VN:0.7.13-r1126 CL:bwa sampe index 1.sai 2.sai 1.fastq 2.fastq
SRR3405394.1 77 * 0 0 * * 0 0 NCGACTGAGGTAATTACAGTTCTTCGGTTCCAGCCAGCTGTCTCAGTTTATGGACCAGAACAACCCGCTGTCTGAGATTACGCACAAACGTCGTATCTCCGCACTCGGCCCAGGCGGTCTGACCCGTGAACGTGCAGGGTACAGATCGGAA #<GGGGAGGIGGGIGGIIIIIGIIIIIIIIIIGIIIIIIIGGIIGIGGIIIIIGIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIGIGGIGGIIGGIIIIIGGGIIIIIGGIIIIGIIIGIGII<GGGGAGIGGAGG
SRR3405394.1 141 * 0 0 * * 0 0 GTACCCTGCACGTTCACGGGTCAGACCGCCTGGGCCGAGTGCGGAGATACGACGTTTGTGCGTAATCTCAGACAGCGGGTTGTTCTGGTCCATAAACTGAGACAGCTGGCTGGAACCGAAGAACTGTAATTACCTCAGTCGTAGATCGGAA GGAAGIIGIIIGIGIIGGGIIGGGGIGGIIAGGGIIGGGGGIGIGGGGIIGGGIGGGGGIIIIIGGIIIIIIIIIIIIGIIIGIIIIGIIIGGIIIGIIIIAGGIIIGGAGIGGGGGGI<GGGGIIIIIGGGIGIIIGIIIIIAGGIIIIG
All my reads are flagged 77 / 141, i.e. they're not aligned. What's the problem? I have a feeling that the quality scores are not recognized, but that's just a wild guess based on some quick reading.
Thank you in advance, TP
Do you have a reason to use bwa aln rather than bwa mem?
I am interested in extracting aberrant pairs (same orientation fwd-fwd and bkw-bkw combinations). I know that
bwa aln
is able to mark them properly but I am not sure aboutbwa mem
. Do you happen to know ifbwa mem
handles these type of pairs the same way asbwa aln
does?Just out of curiosity: why are you running aln and index at the same time? For what I remember aln is used to align and should create a sai output, while index is used to create an index of a fasta reference file.
I guess
index
is a placeholder for the path to his index file?Correct,
index
is placeholder for my index file.Is it possible that you have quality in Illumina 1.3+ or 1.5+ while (I think) bwa is now expecting Illumina 1.8+? Try giving a look at these posts:
What Is The Default Quality Encoding Expected By Bwa?
Convert Illumina 1.3 To Illumina 1.5
https://en.wikipedia.org/wiki/FASTQ_format
I did this awhile ago but I believe I verified that I have 1.8+ Illumina score. If I remember correctly FastQC can detect what scores are used.
mmm, yes, I noticed you have
#
in the quality, which cannot be Illumina 1.3+ nor 1.5+.1) Why do you think it is something in the quality? Did you receive some warning?
2) Have you tried building a small reference identical to one of your reads and aligning against it?
1) Googling here and there led me to think that quality scores might be a problem but now I know it shouldn't be. 2) Nope, but that's a good idea. I'll try it.