Question

How to Extract Full sequences for Low Quality (predicted) Protein Sequences from whole genome data

0

Entering edit mode

5.8 years ago

kkumarreddy • 0

HI,

Can anyone suggest me methodologies for extracting the complete sequence for the low quality predicted protein sequence reported in refseq database or NCBI protein database?

1)I have whole genome data of more than 50X coverage. When I do blast search (with human ortholog) against the SRA data I get many sequences because my gene of interest has 4 other similar protein sequences with approx 40% sequence identity .

2) the assembly available has missing residues at the exon regions.

My aim is to find the cDNA sequence so i could clone and characterize the protein by experimental methods

Thank You for your help. Kumar

Assembly genome next-gen sequence • 989 views

ADD COMMENT • link 5.8 years ago by kkumarreddy • 0

0

Entering edit mode

You ca use something like backtranseq from EMBOSS. Here is a link to web interface for the tool. You can obviously run it from command line if you want to by installing EMBOSS.

ADD REPLY • link 5.8 years ago by GenoMax 153k

0

Entering edit mode

Thanks for your suggestion. I actually used tblastn to search for the sequences. The problem is missing residues in the sequence. I am 100 % sure that the gene of my interset is present in the other species. Out of 650 amino acids, i mostly get regions covering 600 amino acids. But, this is not sufficient for generating the clone. What i dont understand from the assemblies is, even after 50X coverage, why there are still "NNNNNNNN" regions in the assemblies.

ADD REPLY • link 5.8 years ago by kkumarreddy • 0