How To Generate A Paired End File Suitable For Bwa
1
0
Entering edit mode
12.1 years ago
GPR ▴ 390

A question about BWA and paired-end reads. In TopHat, one inputs the paired-end fastq file twice and the tool understands you have paired-end reads. I see that in the case of BWA one must align each pair separately. Can somebody give me a hint on how to generate these two matching files from my paired-end one? Thanks. G.

bwa • 6.0k views
ADD COMMENT
0
Entering edit mode

I don't understand your question, do you have one fastq-file with mixed pairs and want to split for BWA? or you don't know how to run BWA with 2 fastq PE files.

ADD REPLY
0
Entering edit mode

The first: have 1 paired-end fastq and want to split the pairs.

ADD REPLY
0
Entering edit mode

a simple script can help you, can you show us the input? (just to be sure of the format)

ADD REPLY
0
Entering edit mode

I have *.dat files.

ADD REPLY
1
Entering edit mode

*.dat? do you have fastq/fq or sam/bam? In fastq file mixed pairs can be in two (as far I know) formats:

1) each read is reported independely:

  @read_1
  ACATTCATTCATCTAT
  +
  BBBBBBBBBBBBBBBB
  @read_2
  TGCATGCAGCATGGCC
  +
  BBBBBBBBBBBBBBBB

2) both pairs are fusioned:

@read_12
ACATTCATTCATCTATTGCATGCAGCATGGCC
+
BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB

but 2) can be tricky depending if pairs are f-f or f-r

ADD REPLY
0
Entering edit mode

I am not aware of this format. If are using unix can you do sth like "head -10 yourdatfile" and copy and paste the output here.

ADD REPLY
0
Entering edit mode

OK sorry, It's FASTQ right out of the Illumina instrument. I said *.dat, because that's how the end. They are as I say FASTQ.

ADD REPLY
1
Entering edit mode
As JC mentioned above if the files are sth like this:

@read_1
ACATTCATTCATCTAT
+
BBBBBBBBBBBBBBBB

@read_2
TGCATGCAGCATGGCC
+
BBBBBBBBBBBBBBBB

Then you can use grep function in unix. For example,

grep -A3 "* _1" Filewithmixedreads.txt  >  Filewithreadsfrom_1end.fastq should work. 

Similarly you should be able to get reads ending with _2 in another file. 
Hope this helps otherwise please print some lines from the file for us to see. 

Thanks.
ADD REPLY
0
Entering edit mode

My reads actually look like this: + CCCFFFFFHHHHHIHGGHFIJJIIJJIJIEHHJJJFGJIJIIIIJJJIGCGHIIJHEHIJJB=?DFBCCBEEEEDA @HWI-ST974:67:C0545ACXX:2:1101:1628:32111 1:N:0: GTTGGAGCAGGCCCGCAAGGCCGAAGAGGTGCAGGCCTGGGCGCAGCGCAAGGAGCGGGAAGTGCTGCAGCTGCAG + @BCFFFFFHHHGHJIJJJJJIJJJJJGJJFHIGJJIJJJJIGGHFFEDDDDDDDDDDDDDBBCDDEDDDDDDDDD> @HWI-ST974:67:C0545ACXX:2:1101:1521:32121 1:N:0: GTGACTGTCGTGTCCTCGTCGACCTCCTTCTCCTGTCGCTCCAGATCCGCCTCAATCTCCTTGAGCTCTTCCAGCT Thanks!

ADD REPLY
0
Entering edit mode

that looks like a single-end reads or the first pair in a pair-end (1:N:0), are you sure that do you have mixed paired-ends? Actually, from your example, your first sequence map to MPRS26 (chr20:3027308-3027383), and the second to SPC24 (chr19:11258704-11258779) in hg19 without spanning.

ADD REPLY
0
Entering edit mode

I am pretty sure these are paired-end reads. Have actually analysed this data with TopHat-Cufflinks. So I guess in my data each read was reported independently and using grep is a good option. Thanks so much!

ADD REPLY
0
Entering edit mode
12.1 years ago

For BWA, you will have to provide all the reads (1 file containing all the forward or _1 reads from the paired ends) belonging to one end of the paired ends in the first step. You will have to redo the same step but this time you need to provide reads from the other end (1 file containing all the reverse or _2 reads from the paired ends). Both of these steps will produce .sai files (1 for each step OR two in total). These two files will be used by BWA sampe and 1 sam file will be produced.

1) bwa aln 1 file containing reads from one end > 1.sai file 2) bwa aln same step for the other end > 2 .sai file 3) bwa sampe 1.sai 2.sai > bam file

Hope this helps you.

ADD COMMENT
0
Entering edit mode

Thanks for the pointers. I actually understand this much. What I am trying to find out is how to split my paired-end file.

ADD REPLY

Login before adding your answer.

Traffic: 2029 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6