Question

transcriptome shotgun assembly annotation

0

Entering edit mode

8.0 years ago

najibveto ▴ 120

hello i am working on fish and i am doing research on fathead minnow but there is lack in genes deposited in NCBI database but there is a published transcriptome shotgun assembly without annotation https://www.ncbi.nlm.nih.gov/nuccore/GCVQ00000000.1 can someone tell me how to do annotation for this and get CDS sequences and UTR sequences based on closet relative of the fish which usc zebrafish? thanks a lot for your help.

Assembly transcriptome • 2.9k views

ADD COMMENT • link updated 8.0 years ago by Farbod ★ 3.4k • written 8.0 years ago by najibveto ▴ 120

score 0 · Answer 1 · 2016-12-08

0

Entering edit mode

8.0 years ago

EVR ▴ 610

Hi,

Use TransDecoder tool for annotating the transcriptome. Quite useful and reliable

ADD COMMENT • link 8.0 years ago by EVR ▴ 610

score 0 · Answer 2 · 2016-12-08

0

Entering edit mode

8.0 years ago

Farbod ★ 3.4k

Dear najibveto, Hi

You could begin with BLASTX your transcripts against SwissProt, or BlastX them against Danio rerio Reference protein.

~ Take Care

ADD COMMENT • link 8.0 years ago by Farbod ★ 3.4k

0

Entering edit mode

thanks for your advice however i am newbie to bioinformatics so can you please a tutorial how to do blastx against zebrafish in linux? thanks a lot for your help.

ADD REPLY • link 8.0 years ago by najibveto ▴ 120

0

Entering edit mode

Hi Najib,

You can find some simple script for making blast database and running BlastX in here (ofcourse you need to use more relax e-value, e.g e-value=1e-6).

you can add other parameter to your blast, ofcourse.

You can use "outfmt 5" parameter for XML blast result format and then use it in Blast2GO for annotation (which for huge number of transcripts is very time consuming)

As @Thamizh has said, you can also use TransDecoder for prediction of ORF of your transcripts and integrate blast result into it for more accuracy.

~ Best

NOTE: It would be good to check for the original paper of fathead carp to see which annotation strategy they had proceeded.

ADD REPLY • link 8.0 years ago by Farbod ★ 3.4k

0

Entering edit mode

thanks a lot for your help. I did as follow I looked for the longest ORF using Transdecoder, after that i blasted them against zebrafish reference proteins and now I need the 3'UTR for each gene to be used for miRNA target prediction.

ADD REPLY • link 8.0 years ago by najibveto ▴ 120

1

Entering edit mode

Nice to hear there is some progress ;-)

If you have a GFF3 file generated by transdecoder for your transcriptome then you could use grep "three_prime_UTR" your_gff3 > output.txt". This command will output only the 3_prime_UTR regions (Start and End) for every transcript in your transcriptome.

ADD REPLY • link 8.0 years ago by Farbod ★ 3.4k

1

Entering edit mode

thanks a lot for your help^^ now I got the 3'utr start and end point and I got the blast results and I integrated with the transcoder results how is it possible to get 3'utr sequences with respective gene id based on blast results?

ADD REPLY • link 8.0 years ago by najibveto ▴ 120