How to get gene names from tophat results
1
0
Entering edit mode
9.6 years ago

I am new to RNASeq data.

Currently I am looking for repeats in RNASeq data. I am very simply looking for the presence of repeats from an individual sample (not caring where they come from). I do this using the method here: Aligning Rna-Seq To Repetitive Line-1 Elements

This basically tells tophat to align to the reference I've given it (which it builds from the GTF file of repeats) and then failing that align to the human genome.

I would like to get the names of the repeats it aligns to but obviously the output is a bam file. I then convert this to a bed file (bamtobed from samtools) and then do a bedtools closest against a bed file of repeats to get the names ( with distance=0).

This all seems a bit long winded. Is there an easier way to get the names of repeats (or on any genes for the benefit of others) without the samtools-bedtools bit?

RNA next-gen tophat RNA-Seq • 2.1k views
ADD COMMENT
0
Entering edit mode
9.6 years ago
mark.ziemann ★ 1.9k

You can run repeatmasker on the reads directly, you will find that this is pretty slow, so you might need to limit your analysis to 1 million reads only.

Alternative method is to take the repeat library from RepeatMasker and then use BWA/Bowtie2 to map the reads to the "repeatome", this can be done in a few minutes once you get the repeat library (info here).

I did a blog post on this a while back.

ADD COMMENT

Login before adding your answer.

Traffic: 1791 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6