Mapping fastq files with paired reads and member of the paired reads not present
3
0
Entering edit mode
10.0 years ago
vigprasud ▴ 60

I have fastq files with paired reads and also reads whose member of the pair is absent.

Eg:

R1.fq
@HWI:xxxxxxxxxxxxx/1
@HWI:xxxxxxxxxxxxy

Eg:

@HWI:xxxxxxxxxxxxx/2
@HWI:xxxxxxxxxxxyx

When tried to align using bwa, it throws an error saying member of the mate pair absent

How do I align these fastq files? Note that these fastq files were created from a previously aligned bams using picard tools.

alignment mapped-reads bwa • 4.3k views
ADD COMMENT
4
Entering edit mode
10.0 years ago
SES 8.6k

You can use Pairfq to fix you paired-end reads, specifically the pairfq makepairs command (more info on the wiki). I have tried the other solution mentioned and this is a more efficient and general solution (accepts muliti-line FASTA/Q as input, files can be compressed, and it is not restrictive on the read name). To be clear, I developed this tool, and I did so because I got tired of repeating awk commands, cleaning up intermediate files, fixing the file names after the commands, modifying the commands for different inputs, etc. There is also a script included that has no dependencies, so you should be able to use this anywhere you can use awk.

EDIT: To be clear on the last part, this is all you need:

curl -L git.io/pairfq_lite > pairfq_lite
chmod +x pairfq_lite
./pairfq_lite -h

The last command will just print the usage menu. All the documentation is available at the command line (with ./pairfq_lite -m) and there is more information on the wiki. I hope that makes it easier to use.

ADD COMMENT
0
Entering edit mode

Thanks, The tool created a merged R1_R2 paired file and another file with orphan reads. Is there any other script that you have written that splits the merged file into R1 and R2 respectively as that would be necessary for alignment. There needs to be two different files [R1 and R2] during alignment.

ADD REPLY
1
Entering edit mode

The pairfq makepairs command creates separate files that are in order, the pairfq joinpairs command will interleave the pairs. The command you want for splitting the pairs from an interleaved file is pairfq splitpairs (see the wiki for that command for more info). Just for reference, you can type pairfq and it will list all the commands, and a description of the basic usage can be found on the wiki home page. Feel free to ask me questions, or post them online under the "issues" tab.

ADD REPLY
0
Entering edit mode

Thanks, The singles are combined or are they printed out in different files as well?

ADD REPLY
0
Entering edit mode

If you are referring to the pairfq makepairs command, the singleton reads from each pair are written to separate files (explained here).

ADD REPLY
0
Entering edit mode

I tried the script. I got the pairs separate but I could not get them in order.

R1.fq has

ReadA/1
ReadB/1
ReadC/1

while R2.fq has

ReadC/2
ReadA/2
ReadB/2

Is there a way that this program sorts them?

ADD REPLY
2
Entering edit mode
10.0 years ago

See this post: Combining The Paired Reads From Illumina Run

It will help you create a pair of ordered fastq files. You can align ordered fastq files as a paired-end reads separately from unpaired reads (also known as orphan reads) that need to be aligned as a single end.

ADD COMMENT
0
Entering edit mode

Thanks, For single end reads, doesnt it need to know if it is a R1 read or an R2 read?

ADD REPLY
0
Entering edit mode
10.0 years ago

You also can try this script.

https://www.dropbox.com/s/4apg7uykv35koto/cmpfastq_pe.pl?dl=0

It will take two fastq files from illumina and compares them and spits out two files for R1 and R2 with common reads in order. It also spits out reads that are unique to R1 and R2 in to two separate files.

If you have different pattern of readname, you need to edit the regex string to make it work for your files.

ADD COMMENT

Login before adding your answer.

Traffic: 1956 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6