Questions About Pair-End Reads Which Are Not Match For Bwa
2
2
Entering edit mode
11.2 years ago
Tonyzeng ▴ 310

My question is that my original data or pair end read data are consisted of Read1 and Read2 and they are corresponding to each other. When I have done quality filtering, Read1 and Read2 data are not corresponding to each other, is that mean my data are not pair-end data anymore? Is this the reason that BWA was failed when I assigned Read1 and Read2 read as pair end reads? If yes, I need to set them as single end read and run BWA "aln" separately, right? Last question is should I merge the read 1 BAM file and read2 BAM file together by using BWA "sampe"?

Thank you very much for the answer!!

bwa • 8.6k views
ADD COMMENT
2
Entering edit mode

I have just found this very nice script from Eric Normandeau hosted here: https://github.com/enormandeau/Scripts/blob/master/fastqCombinePairedEnd.py from this post Combining the paired reads from Illumina run It basically does the job, you get 2 files for forward and reverse reads that do pair and one extra file for orphans.

ADD REPLY
0
Entering edit mode

This script is executing fine on my laptop...give perfect results when I execute it on my laptop. But when I try to execute it on the server, it generates blank files. I don't know why. The operating system of my server is Debian. Do have any Idea like how can I fix this??

ADD REPLY
0
Entering edit mode

how have you done "quality filtering" ? di you remove some reads from one fastq but not from the other (mate) ?

ADD REPLY
1
Entering edit mode
11.2 years ago

You should start everything from scratch. By scratch I mean start with the original fastq files and use some filtering tools that preserves the fastq order of the forward and reverse reads. Trimmomatic is one I use for filtering. Link: http://www.usadellab.org/cms/?page=trimmomatic

ADD COMMENT
0
Entering edit mode

ashutoshmits, thank you for your suggestion. When I filter reads (Fastx-toolkit) which has quality under 20 which accounts for more 75% of sequence for both read1 and 2 files, for example, the 5th read in read1 file has been filtered out because it has more than 75% sequence with quality score under 20. However, the 5th read in read2 file is stayed because it is good. in this situation, we can not find corresponding the 5th read in read1 and 2 file.

If I use Trimmomatic to preserve the fastq order of read1 and 2, I am afraid that it does not work, right?

ADD REPLY
0
Entering edit mode

It will create 4 different files. The first two files will have read1 and read2 with the same order. In other words, if the 5th read of read1 file is filtered then the 5th read of read2 file is also filtered. Don't worry much if you are loosing some reads. They will not affect your analysis in any way. The other two files will have the reads from read1 and read2 files for which the other pair was discarded. In your case, read 5 of read2 file will be present in one of the files. Read Trimmomatic and you will understand why it is good to use. Filtering step is still happening but order of the reads in two files is maintained too.

ADD REPLY
1
Entering edit mode
11.2 years ago

You do not necessarily need to start your analysis from scratch. Use FASTQ joiner from Galaxy toolkit online: http://main.g2.bx.psu.edu/

From manual:

This tool joins paired end FASTQ reads from two separate files into a single read in one file. The join is performed using sequence identifiers, allowing the two files to contain differing ordering. If a sequence identifier does not appear in both files, it is excluded from the output.

So, if the sequence identifier does not appear in both files (which often happens after quality filtering), you can still get consistent data afterwards. This tool will put both your pairs into the same file, so you will have to split them again (with added value of not having solo reads without partner).

ADD COMMENT
0
Entering edit mode

Trimmomatic is pretty fast. Uploading fastq files to galaxy and running the analyses and downloading them back may take a while. I do have a script that does equivalent to Fastq joiner but its in python and will take more time than running the analysis from scratch.

ADD REPLY
0
Entering edit mode

You are right, but rongzeng might not want to use another (new) software for whatever reason.

ADD REPLY

Login before adding your answer.

Traffic: 1104 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6