A question about BWA and paired-end reads. In TopHat, one inputs the paired-end fastq file twice and the tool understands you have paired-end reads. I see that in the case of BWA one must align each pair separately. Can somebody give me a hint on how to generate these two matching files from my paired-end one? Thanks. G.
I don't understand your question, do you have one fastq-file with mixed pairs and want to split for BWA? or you don't know how to run BWA with 2 fastq PE files.
The first: have 1 paired-end fastq and want to split the pairs.
a simple script can help you, can you show us the input? (just to be sure of the format)
I have *.dat files.
*.dat? do you have fastq/fq or sam/bam? In fastq file mixed pairs can be in two (as far I know) formats:
1) each read is reported independely:
2) both pairs are fusioned:
but 2) can be tricky depending if pairs are f-f or f-r
I am not aware of this format. If are using unix can you do sth like "head -10 yourdatfile" and copy and paste the output here.
OK sorry, It's FASTQ right out of the Illumina instrument. I said *.dat, because that's how the end. They are as I say FASTQ.
My reads actually look like this: + CCCFFFFFHHHHHIHGGHFIJJIIJJIJIEHHJJJFGJIJIIIIJJJIGCGHIIJHEHIJJB=?DFBCCBEEEEDA @HWI-ST974:67:C0545ACXX:2:1101:1628:32111 1:N:0: GTTGGAGCAGGCCCGCAAGGCCGAAGAGGTGCAGGCCTGGGCGCAGCGCAAGGAGCGGGAAGTGCTGCAGCTGCAG + @BCFFFFFHHHGHJIJJJJJIJJJJJGJJFHIGJJIJJJJIGGHFFEDDDDDDDDDDDDDBBCDDEDDDDDDDDD> @HWI-ST974:67:C0545ACXX:2:1101:1521:32121 1:N:0: GTGACTGTCGTGTCCTCGTCGACCTCCTTCTCCTGTCGCTCCAGATCCGCCTCAATCTCCTTGAGCTCTTCCAGCT Thanks!
that looks like a single-end reads or the first pair in a pair-end (1:N:0), are you sure that do you have mixed paired-ends? Actually, from your example, your first sequence map to MPRS26 (chr20:3027308-3027383), and the second to SPC24 (chr19:11258704-11258779) in hg19 without spanning.
I am pretty sure these are paired-end reads. Have actually analysed this data with TopHat-Cufflinks. So I guess in my data each read was reported independently and using grep is a good option. Thanks so much!