Entering edit mode
9.9 years ago
Leandro de Mattos
▴
90
Hi, Please, I would like an suggestion.
I have mapped 100 bp paired end data from Illumina machine. I used Tophat for mapping, but I have obtained low mapped reads which was 5%. Is the any parameter in tophat to get higher percentage of mapped reads? Could there be any other problem too?
What settings did you use? Did you adapter/quality trim? What species were the reads from and what species did you map against?
how could it be?
can you share your tophat command here?
Have you tried aligning with another program such as STAR? It's possible your data is contaminated or in some other way faulty.
Hi all, thanks by help.
Initially I used the commands:
Later, I removed the adapters and apply quality filters, using the the software tools trimmomatic or Fastx.
I obtained only 11,5 percent of mapped reads before and later adapter trimmer and quality filter when applied:
Commands:
The following command keep reads which has quality score above 20 in at least 50% of bases.
The following operation removes nucleotides having quality scores lower than 20 from the ends of reads. Furthermore, any trimmed reads having lengths less than 50 nucleotides are discarded altogether:
To remove base sequence content and GC content from the end of reads, following command was used. It removes 15 nucleotides from the end of reads.
After this step, the read length distribution changed minimally, with the majority of reads retaining their full length. In addition around 25% of the reads were discarded completely.
In order to remove identical sequences, fastx_collapser tool was used:
Above tools removes few millions reads from each files while maintaining all read counts and gives output in fasta format.
As suggested here: http://seqanswers.com/forums/showthread.php?t=32970&page=2
La
Thanks Devon! :D