How to extract CDS zones and list to an secondary text file. Perl
0
0
Entering edit mode
4.5 years ago
emi_14_ar • 0

I must design a program capable of extracting only the CDS sections of such a file, which are described in the FEATURES section, to which should be added their corresponding nucleotide sequences described in the ORIGIN section, thus creating a new .txt file with a much simpler structure. The designed program must extract from the original file all portions of the CDS with their description to which it must add, by selective extraction from the ORIGIN section, the corresponding nucleotide sequence, thus creating a new .txt file with, in order, only the descriptions of CDS in which the corresponding nucleotide sequences appear.

EXAMPLE of wath should i have:

CDS 110679..111596
     /gene="ENSG00000176695.8"
     /protein_id="ENSP00000467301.1"
     /note="transcript_id=ENST00000585993.3"
     /db_xref="CCDS:CCDS32854"
     /db_xref="Uniprot/SWISSPROT:Q8NGA8"
     /db_xref="RefSeq_peptide:NP_001005240"
     /db_xref="RefSeq_mRNA:NM_001005240"
     /db_xref="Uniprot/SPTREMBL:A0A126GWN0"
     /db_xref="UCSC:ENST00000585993.3"
     /db_xref="EMBL:AB065917"
     /db_xref="EMBL:BC136848"
     /db_xref="EMBL:BC136867"
     /db_xref="EMBL:KP290649"
     /db_xref="GO:0004888"
     /db_xref="GO:0004930"
     /db_xref="GO:0004930"
     /db_xref="GO:0004930"
     /db_xref="GO:0004984"
     /db_xref="GO:0004984"
     /db_xref="GO:0005886"
     /db_xref="GO:0005886"
     /db_xref="GO:0007165"
     /db_xref="GO:0007186"
     /db_xref="GO:0007186"
     /db_xref="GO:0007186"
     /db_xref="GO:0007186"
     /db_xref="GO:0007608"
     /db_xref="GO:0016020"
     /db_xref="GO:0016021"
     /db_xref="GO:0016021"
     /db_xref="GO:0016021"
     /db_xref="GO:0050896"
     /db_xref="GO:0050911"
     /db_xref="HGNC_trans_name:OR4F17-202"
     /db_xref="protein_id:AAI36849"
     /db_xref="protein_id:AAI36868"
     /db_xref="protein_id:ALI87807"
     /db_xref="protein_id:BAC06132"
     /db_xref="Reactome:R-HSA-162582"
     /db_xref="Reactome:R-HSA-372790"
     /db_xref="Reactome:R-HSA-381753"
     /db_xref="Reactome:R-HSA-388396"
     /db_xref="Reactome:R-HSA-418555"
     /db_xref="UniParc:UPI0000041E2A"
     /translation="MVTEFIFLGLSDSQGLQTFLFMLFFVFYGGIVFGNLLIVITVVS
     DSHLHSPMYFLLANLSLIDLSLSSVTAPKMITDFFSQRKVISFKGCLVQIFLLHFFGG
     SEMVILIAMGFDRYIAICKPLHYTTIMCGNACVGIMAVAWGIGFLHSVSQLAFAVHLP
     FCGPNEVDSFYCDLPRVIKLACTDTYRLDIMVIANSGVLTVCSFVLLIISYTIILMTI
     QHRPLDKSSKALSTLTAHITVVLLFFGPCVFIYAWPFPIKSLDKFLAVFYSVITPLLN
     PIIYTLRNKDMKTAIRQLRKWDAHSSVKF"
    110679.. 111596
    ATGGTGACTGAATTCATTTTTCTGGGTCTCTCTGATTCTCAGGGACTCCAGACCTTCCTATTTATGTTGTTTTTTGTATTCTATGGAGGAAT CGTGTTTGGAAACCTTCTTATTGTCATAACAGTGGTATCTGACTCCCACCTTCACTCTCCCATGTACTTCCTGCTAGCCAACCTCTCACTCA
TTGATCTGTCTCTGTCTTCAGTCACAGCCCCCAAGATGATTACTGACTTTTTCAGCCAGCGCAAAGTCATCTCTTTCAAGGGCTGCCTTGT
TCAGATATTTCTCCTTCACTTCTTTGGTGGGAGTGAGATGGTGATCCTCATAGCCATGGGCTTTGACAGATATATAGCAATATGCAAACC  CCTACACTACACTACAATTATGTGTGGCAACGCATGTGTCGGCATTATGGCTGTCGCATGGGGAATTGGCTTTCTCCATTCGGTGAGCC
AGTTGGCCTTTGCCGTGCACTTACCCTTCTGTGGTCCCAATGAGGTCGATAGTTTTTATTGTGACCTTCCTAGGGTAATCAAACTTGCCTG   TACAGATACCTACAGGCTAGATATTATGGTCATTGCTAACAGTGGTGTGCTCACTGTGTGTTCTTTTGTTCTTCTAATCATCTCATACACT  ATCATCCTAATGACCATCCAGCATCGCCCTTTAGATAAGTCGTCCAAAGCTCTGTCCACTTTGACTGCTCACATTACAGTAGTTCTTTTGT TCTTTGGACCATGTGTCTTTATTTATGCCTGGCCATTCCCCATCAAGTCATTAGATAAATTCCTTGCTGTATTTTATTCTGTGATCACCCCT
CTCTTGAACCCAATTATATACACACTGAGGAACAAAGACATGAAGACGGCAATAAGACAGCTGAGAAAATGGGATGCACATTCTAGT TAAAGTTTTAG
genome sequence perl • 666 views
ADD COMMENT

Login before adding your answer.

Traffic: 1651 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6