Can somebody suggest me how to obtain read coverage of a particular transcript in a transcript assembly done by TopHat/Cufflinks?
Can somebody suggest me how to obtain read coverage of a particular transcript in a transcript assembly done by TopHat/Cufflinks?
Here, you need to find the number of reads that are mapped to the given transcripts. For this purpose you need mapped BAM file, the transcript chromosome number and it's start-end co-ordinates. If you have this information, you can find read coverage using samtools as follows
samtools view BAM_file Chromosome:start-end
It will give you all reads that mapped to given transcript region. To count the read coverage, you can pipe wc -l
command
samtools view BAM_file Chromosome:start-end | wc -l
Alongside samtools, bedtools offers ways of reading Tophat/cufflinks output too http://bedtools.readthedocs.io/en/latest/index.html
In addition to the tools mentioned earlier, commonly used tools for counting reads in a genomic interval (gene, exon,...) are featureCounts and htseq-counts. I would recommend featureCounts, because it's very fast, has convenient options, but htseq-counts also works fine. Both are nicely documented.
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Correct me if I'm wrong, but are you sure this is an assembly? TopHat is an aligner, not an assembler.
@WouterDeCoster, TopHat is an aligner and Cufflinks is used for assembly.
Right, but it doesn't modify the alignment, does it? You want to assess the coverage in the alignment. An assembly doesn't have a coverage.
Yes I want to assess the coverage in alignment. Actually in my question, I wanted to make it clear that what software I used
Alright very well. Excuse me for my pedantic nitpicking here, but correct terminology is quite important for quickly getting the right answers.