How to Extract Full sequences for Low Quality (predicted) Protein Sequences from whole genome data
0
0
Entering edit mode
5.1 years ago

HI,

Can anyone suggest me methodologies for extracting the complete sequence for the low quality predicted protein sequence reported in refseq database or NCBI protein database?

1)I have whole genome data of more than 50X coverage. When I do blast search (with human ortholog) against the SRA data I get many sequences because my gene of interest has 4 other similar protein sequences with approx 40% sequence identity .

2) the assembly available has missing residues at the exon regions.

My aim is to find the cDNA sequence so i could clone and characterize the protein by experimental methods

Thank You for your help. Kumar

Assembly genome next-gen sequence • 860 views
ADD COMMENT
0
Entering edit mode

You ca use something like backtranseq from EMBOSS. Here is a link to web interface for the tool. You can obviously run it from command line if you want to by installing EMBOSS.

ADD REPLY
0
Entering edit mode

Thanks for your suggestion. I actually used tblastn to search for the sequences. The problem is missing residues in the sequence. I am 100 % sure that the gene of my interset is present in the other species. Out of 650 amino acids, i mostly get regions covering 600 amino acids. But, this is not sufficient for generating the clone. What i dont understand from the assemblies is, even after 50X coverage, why there are still "NNNNNNNN" regions in the assemblies.

ADD REPLY

Login before adding your answer.

Traffic: 1674 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6