hello i am working on fish and i am doing research on fathead minnow but there is lack in genes deposited in NCBI database but there is a published transcriptome shotgun assembly without annotation https://www.ncbi.nlm.nih.gov/nuccore/GCVQ00000000.1 can someone tell me how to do annotation for this and get CDS sequences and UTR sequences based on closet relative of the fish which usc zebrafish? thanks a lot for your help.
thanks for your advice however i am newbie to bioinformatics so can you please a tutorial how to do blastx against zebrafish in linux? thanks a lot for your help.
Hi Najib,
You can find some simple script for making blast database and running BlastX in here (ofcourse you need to use more relax e-value, e.g e-value=1e-6).
you can add other parameter to your blast, ofcourse.
You can use "outfmt 5" parameter for XML blast result format and then use it in Blast2GO for annotation (which for huge number of transcripts is very time consuming)
As @Thamizh has said, you can also use TransDecoder for prediction of ORF of your transcripts and integrate blast result into it for more accuracy.
~ Best
NOTE: It would be good to check for the original paper of fathead carp to see which annotation strategy they had proceeded.
thanks a lot for your help. I did as follow I looked for the longest ORF using Transdecoder, after that i blasted them against zebrafish reference proteins and now I need the 3'UTR for each gene to be used for miRNA target prediction.
Nice to hear there is some progress ;-)
If you have a GFF3 file generated by transdecoder for your transcriptome then you could use
grep "three_prime_UTR" your_gff3 > output.txt
". This command will output only the 3_prime_UTR regions (Start and End) for every transcript in your transcriptome.thanks a lot for your help^^ now I got the 3'utr start and end point and I got the blast results and I integrated with the transcoder results how is it possible to get 3'utr sequences with respective gene id based on blast results?