In Tophat, there is an option: --coverage-search. When it is on, it takes long time and a lot of memory. But I couldn't find any documentation about what it measn for "coverage-based search"? Could anyone who knows about this explain or point me to the right place? Thank you in advance.
Thanks for the answer. I understood that Tophat tries to identify covery islands and then looks for possible junctions. However, if this is what is called
--coverage-search
, how to understand the fact that you can actually turn off--coverage-search
option. That's why I was confused because I thought Tophat uses coverage search to identify juncitons. Now if we can turn it off, then this step is not necessary for identifying junctions. Then the question is what Tophat uses to identify junctions if specify--no-coverage-search
? Could you comment on that? thanks.the identification of new regions with the coverage is relevant only if you want to detect new splice sites in alternate transcripts or even new genes. If you only want the expression profile of the annotated genes, you can skip this step for speed.
Hi! The identification of new splice sites in different genes/transcripts is still possible without coverage search!
Coverage search is, according to the manual, only useful when you've got very short reads, since in this case the probability that the read will "hit" the splice junction exactly may be very low for relatively lowly expressed transcripts. Hence, you need another way of detecting splice sites, which is where coverage search comes in. To make it easier for the algorithm by using coverage search you are allowing for only the most canonical of GT-AG splice junctions. Which means that if you've got longer reads by enabling this option you lose the other types of splice junctions - GCAG and ATAC - so enabling this option for long reads or deep libraries doesn't make sense, as outlined in the manual.