Hi, I used tophat2 to align paired-end reads to Arabidopsis genome and I want to use the unmapped.bam file to obtain the unmapped reads so I can align to a transgene. The output gives the name of the reads without the paired-end information, so I tried several scripts to extract the reads from the fastq files present on the unmapped.bam, but they did not work. I tried using HISAT2 and bowtie2, but the statistics are completely different, such as no concordant alignment when tophat2 would give me above 95% alignment rate and above 90% of concordant alignment. I want to go back to tophat2, but I need the unmapped reads.
For me the best would be continue using tophat but having a way to filter the fastq files with the unmapped.bam. Does anyone have an idea of how to do that?
Here is the code I used for the alignments: tophat2
tophat -o Control/Control_5 --library-type fr-firststrand -r 60 --mate-std-dev 50 index/TAIR10 CNT-5P_R1.fastq CNT-5P_R2.fastq
hitsat2
hisat2 --no-mixed -X 310 -x sat_index/hisat_tair -1 CNT-5P_R1.fastq -2 CNT-5P_R2.fastq -S Control/Control_5.sam --summary-file Control/Control_5.txt
bowtie2
bowtie --no-mixed -I 210 -X 310 -x index/TAIR10 -1 CNT-5P_R1.fastq -2 CNT-5P_R2.fastq -S Control/Control_5.sam
Hello,
First of all, do not use tophat, it was release in 2008 and Hisat2 does a better job
Then,
You mean the bam file ? From which software ?
Please share what you try, that will help to understand what you want
I don't get the point ? Could you add an example ?
Why did you involve these two parameters ?
I just started hearing about HISAT2 a few months ago, so I wasn't sure I wanted to try it. As I was talking about tophat, the output is the unmapped.bam from tophat.
I tried this script from http://seqanswers.com/forums/showthread.php?t=6847&referrerid=2547
I tried another script on R, but I can't find it anymore, anyway it would return me the whole fastq, not filtered.
About the statistics: Tophat
Hisat2
As for the -I and -X they are the information about the fragment length, in tophat I informed the distance is of 60 and the standard deviation if of 50, in hisat it asks for the max and min fragment length, as my reads are of 100bp, the lengths are min 210 and max 310.
Please try to use the
add reply
button to answer a comment.These two parameters do not stand for fragment length but for insert size, which is the distance between a read and its mate
Remove these 2 parameters from your command line and try again