Question

Tophat: # reads mapping to CDS, rRNA, intergenic?

1

Entering edit mode

9.7 years ago

Adrian Pelin ★ 2.6k

Hello,

How can I get # of reads mapping to CDS/rRNA features or intergenic? I did my alignment with tophat and I have .gtf .gff and .bed annotations of my features.

Thank you,

Adrian

RNA-Seq NGS Tophat • 4.4k views

ADD COMMENT • link updated 2.5 years ago by Ram 44k • written 9.7 years ago by Adrian Pelin ★ 2.6k

0

Entering edit mode

You may try https://www.broadinstitute.org/cancer/cga/rna-seqc or http://rseqc.sourceforge.net/

ADD REPLY • link updated 2.5 years ago by Ram 44k • written 9.7 years ago by Ashutosh Pandey 12k

Ram · Answer 1 · 2015-04-14

0

Entering edit mode

9.7 years ago

Evgeniia Golovina ★ 1.3k

Use HTSeq tool.

python -m HTSeq.scripts.count [options] <alignment_file> <gff_file>

Example command line to count reads mapped to the CDS:

python -m HTSeq.scripts.count --type=CDS --idattr=gene_id --mode=union --stranded=yes <sam_file> <gff_file>

ADD COMMENT • link updated 2.5 years ago by Ram 44k • written 9.7 years ago by Evgeniia Golovina ★ 1.3k

0

Entering edit mode

Thanks for the useful tool, but it doesn't seem to work with tophat alignemnts:

7584 GFF lines processed.
Warning: No features of type 'exon' found.
Error occured when reading beginning of SAM/BAM file.
  ('SAM line does not contain at least 11 tab-delimited fields.', 'line 1 of file con1/accepted_hits.bam')
  [Exception type: ValueError, raised in _HTSeq.pyx:1276]

Hmm, looks like I need to specify -f bam and install a bam reader.

ADD REPLY • link updated 2.5 years ago by Ram 44k • written 9.7 years ago by Adrian Pelin ★ 2.6k

1

Entering edit mode

I can confirm that yes, all you need to do is specify "-f bam" and this error should go away. The error is misleading because it says "SAM/BAM". Many other tools perform this detection behind-the-scenes with no problems, so it surprising that HTSeq does not.

ADD REPLY • link 9.2 years ago by Keith Callenberg ▴ 960

0

Entering edit mode

I am not sure (this is just a guess) but the problem may be because you are providing bam file to HTseq as an input. It requires a SAM file. You can make HTseq read from the standard input ("-") using the following command.

samtools view accepted_hits.bam | python -m HTSeq.scripts.count --type=CDS --idattr=gene_id --mode=union --stranded=yes - <gff_file>

Or you can convert your bam file to sam file

samtools view -h input.bam > output.sam and give it to HTSeq.

If this is not the case, would you mind pasting the first few lines from your bam/sam.

ADD REPLY • link 9.7 years ago by Ashutosh Pandey 12k

0

Entering edit mode

Though you can see this in the example command that is provided here, I'd like to point out that to get the full distribution of CDS vs UTR vs Introns vs Intergenic, you will have to run at least that many commands with htseq-count, changing the --type= argument for each feature type.

ADD REPLY • link 9.2 years ago by Keith Callenberg ▴ 960