Question

tophat: the Searching for junctions via segment mapping was skipped.

0

Entering edit mode

8.4 years ago

Pei ▴ 220

Hi all:

in tophat work, there was a step: Searching for junctions via segment mapping, which was run by segment_juncs

however, when dealing with some mouse data, I found that this step was finished in about 2 minutes, which would be longer when processing other data. it seemed that this step was skipped.

Do you know what happened ?

anyway, the tophat job could be finished without error-reporting, but the resulting mapping rate was low, ~45%.

Thanks in advance! Best

rna-seq • 1.8k views

ADD COMMENT • link 8.4 years ago by Pei ▴ 220

0

Entering edit mode

Can you show the commands that you used for these datasets?

ADD REPLY • link 8.4 years ago by Satyajeet Khare ★ 1.6k

0

Entering edit mode

Thank you.

my cmd was:

tophat2 -G Mus_musculus.GRCm38.83.gtf -p 4 -o tophat_out --transcriptome-index=mm10transcriptome/known mm10dna **SRR306769**.fastq

the data I used was downloaded from NCBI: GSE30352

ADD REPLY • link updated 8.4 years ago by GenoMax 147k • written 8.4 years ago by Pei ▴ 220

0

Entering edit mode

Please use ADD REPLY/ADD COMMENT to provide additional information for existing comments/posts. Reserve SUBMIT ANSWER for new answers for the original question.

ADD REPLY • link 8.4 years ago by GenoMax 147k

0

Entering edit mode

Can you check the tophat_out folder and look for junction.bed file? If its empty then its skipped. If its not, then the step is not skipped.

ADD REPLY • link 8.4 years ago by Satyajeet Khare ★ 1.6k

0

Entering edit mode

not empty. That file had 141909 lines, and the step "Searching for junctions via segment mapping" takes about 5 minutes However, in another dataset SRR1915443, this junctions.bed had 143339 lines, and the step "Searching for junctions via segment mapping" takes more than 1 hours.

both were using mm10. SRR306769 had 45613332 reads, while SRR1915443 had 62575245 reads.

ADD REPLY • link 8.4 years ago by Pei ▴ 220

0

Entering edit mode

Just saw the samples. They differ in read length (50 vs 76) and quality (SRR1915443 has a good read quality). So number of reads that will pass QC will be different in each sample. If you have used default setting (segment length 25) then, the two samples will differ in number of segments per read (2 vs 3). These factors may have contributed to difference in the time it took for mapping junction reads.

ADD REPLY • link 8.3 years ago by Satyajeet Khare ★ 1.6k