Question

Suggestion for correctly annotating predicted genes and discovering isoform of the genes if any with RNA-seq

0

Entering edit mode

10.0 years ago

mjoyraj ▴ 80

Few genes are poorly annotated in the database and there status are marked as predicted. I want to check the actual annotation of the genes. Also I want to look at, if any of the genes have isoforms.

To address the above points, I have developed RNA-seq data's from the developing tissues where the genes are expressed. Let us consider the RNA seq data as

1_R1, 1_R2
2_R1, 2_R2
3_R1, 3_R2
4_R1, 4_R2
5_R1, 5_R2

I have downloaded the genome file and the gtf file which are say

genome.fa, genes.gtf

I have manually incorporated the predicted annotation of the genes of interest in the gtf file in proper format.

Next, I want to do mapping and assembly with Tophat and Cufflinks to address the above issue.

My Tophat command will be:

tophat -p 40 -G genes.gtf -o <tophat_output_file> <indexed-genome.fa> 1_R1 1_R2

(Same for the other four)

My cufflinks command will be

cufflinks \
  -o <cufflink_output_file> \
  -p 12 \
  -g <genes.gtf> \
  -b <genome_file> \
  --max-bundle-frags 1000000000000 \
  --multi-read-correct <tophat_output_.file-accepted.bam>

I will use IGV browser to visualize the mapped reads (tophat_output_.file-accepted.bam). I will use it again to visualize and compare the original gtf file and the cufflinks gtf file. I will use the tophat junctions.bed file to visualize the exon-exon junctions. I hope the comparison of original gtf and cufflinks gtf will help me to correctly annotate the genes, and 'junctions' will give clue of isoform of the genes if any. The expressions of the genes will be confirmed by the cufflinks genes.fpkm file. The expression of the isoforms if any will be confirmed by the isoform.fpkm file.

Whether, the tophat and cufflinks commands are okay to correctly annotate the genes and find isoform of the genes? Any suggestion will be highly appreciated.

Assembly alignment next-gen RNA-Seq • 3.2k views

ADD COMMENT • link updated 2.8 years ago by Ram 44k • written 10.0 years ago by mjoyraj ▴ 80

Ram · Answer 1 · 2014-12-01

0

Entering edit mode

10.0 years ago

Manvendra Singh ★ 2.2k

tophat -p 40 -G genes.gtf -o <tophat_output_file> <indexed-genome.fa> 1_R1 1_R2

In here you give tophat output folder not file, tophat would make files

You do not give genome.fa but just the prefix of index, it would search .fa by own.

After cufflinks output, run cuffcompare to all the transcripts.gtf.

In output with class code "j" you would have isoforms, just do little filtering on FPKMs and length of detected isoforms .

This is my suggestion.

ADD COMMENT • link updated 2.8 years ago by Ram 44k • written 10.0 years ago by Manvendra Singh ★ 2.2k

0

Entering edit mode

Thanks. Do you have any idea how to filter the false junctions from the junction.bed file produced by Tophat?

ADD REPLY • link updated 2.8 years ago by Ram 44k • written 10.0 years ago by mjoyraj ▴ 80