Question

How to get gene names from tophat results

0

Entering edit mode

10.1 years ago

sebastianzeki0 ▴ 240

I am new to RNASeq data.

Currently I am looking for repeats in RNASeq data. I am very simply looking for the presence of repeats from an individual sample (not caring where they come from). I do this using the method here: Aligning Rna-Seq To Repetitive Line-1 Elements

This basically tells tophat to align to the reference I've given it (which it builds from the GTF file of repeats) and then failing that align to the human genome.

I would like to get the names of the repeats it aligns to but obviously the output is a bam file. I then convert this to a bed file (bamtobed from samtools) and then do a bedtools closest against a bed file of repeats to get the names ( with distance=0).

This all seems a bit long winded. Is there an easier way to get the names of repeats (or on any genes for the benefit of others) without the samtools-bedtools bit?

RNA next-gen tophat RNA-Seq • 2.2k views

ADD COMMENT • link updated 2.9 years ago by Ram 45k • written 10.1 years ago by sebastianzeki0 ▴ 240

Ram · Answer 1 · 2015-04-22

You can run repeatmasker on the reads directly, you will find that this is pretty slow, so you might need to limit your analysis to 1 million reads only.

Alternative method is to take the repeat library from RepeatMasker and then use BWA/Bowtie2 to map the reads to the "repeatome", this can be done in a few minutes once you get the repeat library (info here).

I did a blog post on this a while back.