Question

Low percentage of reads in Tophat, Is the any setting to improve?

1

Entering edit mode

9.9 years ago

Leandro de Mattos ▴ 90

Hi, Please, I would like an suggestion.

I have mapped 100 bp paired end data from Illumina machine. I used Tophat for mapping, but I have obtained low mapped reads which was 5%. Is the any parameter in tophat to get higher percentage of mapped reads? Could there be any other problem too?

Tophat • 4.3k views

ADD COMMENT • link updated 2.7 years ago by Ram 44k • written 9.9 years ago by Leandro de Mattos ▴ 90

2

Entering edit mode

What settings did you use? Did you adapter/quality trim? What species were the reads from and what species did you map against?

ADD REPLY • link 9.9 years ago by Devon Ryan 104k

1

Entering edit mode

how could it be?

can you share your tophat command here?

ADD REPLY • link 9.9 years ago by Manvendra Singh ★ 2.2k

1

Entering edit mode

Have you tried aligning with another program such as STAR? It's possible your data is contaminated or in some other way faulty.

ADD REPLY • link updated 2.7 years ago by Ram 44k • written 9.9 years ago by jshall ▴ 40

0

Entering edit mode

Hi all, thanks by help.

Initially I used the commands:

tophat2 -p 3 -o noadapter__thout --library-type=fr-unstranded genome A2_1_noadapter.fastq A2_2_noadapter.fastq

Later, I removed the adapters and apply quality filters, using the the software tools trimmomatic or Fastx.

I obtained only 11,5 percent of mapped reads before and later adapter trimmer and quality filter when applied:

Commands:

The following command keep reads which has quality score above 20 in at least 50% of bases.

fastq_quality_filter -Q33 -q20 -p 50 -i <SAMPLE_NAME>.fastq -o <SAMPLE_NAME>.quality_filter.fastq

The following operation removes nucleotides having quality scores lower than 20 from the ends of reads. Furthermore, any trimmed reads having lengths less than 50 nucleotides are discarded altogether:

fastq_quality_trimmer -Q33 -t 20 -l 50 -i <SAMPLE_NAME>.fastq -o <SAMPLE_NAME>.clean.fastq

To remove base sequence content and GC content from the end of reads, following command was used. It removes 15 nucleotides from the end of reads.

fastx_trimmer -Q33 -f 1 -l 335 -i <SAMPLE_NAME>.clean.fastq -o <SAMPLE_NAME>.fastx_trimmer.fastq

After this step, the read length distribution changed minimally, with the majority of reads retaining their full length. In addition around 25% of the reads were discarded completely.

In order to remove identical sequences, fastx_collapser tool was used:

fastx_collapser -v -i <SAMPLE_NAME>.fastq -o <SAMPLE_NAME>_collapsed.fasta

Above tools removes few millions reads from each files while maintaining all read counts and gives output in fasta format.

As suggested here: http://seqanswers.com/forums/showthread.php?t=32970&page=2

La

ADD REPLY • link updated 2.7 years ago by Ram 44k • written 9.9 years ago by Leandro de Mattos ▴ 90

2

Entering edit mode

Do not use the fastx toolkit with paired end datasets. Trimmomatic is fine, but any tool that processes the two fastq files for each sample separately should never be used.
There is absolutely no reason to use fastx_collapser unless you plan on using the resulting collapsed reads for assembly. Aligners will be perfectly happy with multiple essentially identical read pairs and the downstream statistics are more annoying to do if you collapse the reads.

ADD REPLY • link updated 2.7 years ago by Ram 44k • written 9.9 years ago by Devon Ryan 104k

0

Entering edit mode

Thanks Devon! :D

ADD REPLY • link updated 2.7 years ago by Ram 44k • written 9.8 years ago by Leandro de Mattos ▴ 90