Question

Questions About Pair-End Reads Which Are Not Match For Bwa

2

Entering edit mode

11.2 years ago

Tonyzeng ▴ 310

My question is that my original data or pair end read data are consisted of Read1 and Read2 and they are corresponding to each other. When I have done quality filtering, Read1 and Read2 data are not corresponding to each other, is that mean my data are not pair-end data anymore? Is this the reason that BWA was failed when I assigned Read1 and Read2 read as pair end reads? If yes, I need to set them as single end read and run BWA "aln" separately, right? Last question is should I merge the read 1 BAM file and read2 BAM file together by using BWA "sampe"?

Thank you very much for the answer!!

bwa • 8.6k views

ADD COMMENT • link updated 11.2 years ago by Biomonika (Noolean) 3.2k • written 11.2 years ago by Tonyzeng ▴ 310

2

Entering edit mode

I have just found this very nice script from Eric Normandeau hosted here: https://github.com/enormandeau/Scripts/blob/master/fastqCombinePairedEnd.py from this post Combining the paired reads from Illumina run It basically does the job, you get 2 files for forward and reverse reads that do pair and one extra file for orphans.

ADD REPLY • link 11.2 years ago by Biomonika (Noolean) 3.2k

0

Entering edit mode

This script is executing fine on my laptop...give perfect results when I execute it on my laptop. But when I try to execute it on the server, it generates blank files. I don't know why. The operating system of my server is Debian. Do have any Idea like how can I fix this??

ADD REPLY • link 8.1 years ago by s.singh ▴ 70

0

Entering edit mode

how have you done "quality filtering" ? di you remove some reads from one fastq but not from the other (mate) ?

ADD REPLY • link 11.2 years ago by Pierre Lindenbaum 164k

score 1 · Answer 1 · 2013-09-19

1

Entering edit mode

11.2 years ago

Ashutosh Pandey 12k

You should start everything from scratch. By scratch I mean start with the original fastq files and use some filtering tools that preserves the fastq order of the forward and reverse reads. Trimmomatic is one I use for filtering. Link: http://www.usadellab.org/cms/?page=trimmomatic

ADD COMMENT • link 11.2 years ago by Ashutosh Pandey 12k

0

Entering edit mode

ashutoshmits, thank you for your suggestion. When I filter reads (Fastx-toolkit) which has quality under 20 which accounts for more 75% of sequence for both read1 and 2 files, for example, the 5th read in read1 file has been filtered out because it has more than 75% sequence with quality score under 20. However, the 5th read in read2 file is stayed because it is good. in this situation, we can not find corresponding the 5th read in read1 and 2 file.

If I use Trimmomatic to preserve the fastq order of read1 and 2, I am afraid that it does not work, right?

ADD REPLY • link 11.2 years ago by Tonyzeng ▴ 310

0

Entering edit mode

It will create 4 different files. The first two files will have read1 and read2 with the same order. In other words, if the 5th read of read1 file is filtered then the 5th read of read2 file is also filtered. Don't worry much if you are loosing some reads. They will not affect your analysis in any way. The other two files will have the reads from read1 and read2 files for which the other pair was discarded. In your case, read 5 of read2 file will be present in one of the files. Read Trimmomatic and you will understand why it is good to use. Filtering step is still happening but order of the reads in two files is maintained too.

ADD REPLY • link 11.2 years ago by Ashutosh Pandey 12k

score 1 · Answer 2 · 2013-09-19

1

Entering edit mode

11.2 years ago

Biomonika (Noolean) 3.2k

You do not necessarily need to start your analysis from scratch. Use FASTQ joiner from Galaxy toolkit online: http://main.g2.bx.psu.edu/

From manual:

This tool joins paired end FASTQ reads from two separate files into a single read in one file. The join is performed using sequence identifiers, allowing the two files to contain differing ordering. If a sequence identifier does not appear in both files, it is excluded from the output.

So, if the sequence identifier does not appear in both files (which often happens after quality filtering), you can still get consistent data afterwards. This tool will put both your pairs into the same file, so you will have to split them again (with added value of not having solo reads without partner).

ADD COMMENT • link 11.2 years ago by Biomonika (Noolean) 3.2k

0

Entering edit mode

Trimmomatic is pretty fast. Uploading fastq files to galaxy and running the analyses and downloading them back may take a while. I do have a script that does equivalent to Fastq joiner but its in python and will take more time than running the analysis from scratch.

ADD REPLY • link 11.2 years ago by Ashutosh Pandey 12k

0

Entering edit mode

You are right, but rongzeng might not want to use another (new) software for whatever reason.

ADD REPLY • link 11.2 years ago by Biomonika (Noolean) 3.2k