I have the following file format: http://www.ncbi.nlm.nih.gov/Sitemap/samplerecord#FeaturesA
and I want to extract the information contained under the CDS regions (there can be many CDS regions and they need not be consecutive)
CDS <1..206
/codon_start=3
/product="TCP1-beta"
/protein_id="AAA98665.1"
/db_xref="GI:1293614"
/translation="SSIYNGISTSGLDLNNGTIADMRQLGIVESYKLKRAVVSSASEA
AEVLLRVDNIIRARPRTANRQHM"
gene 687..3158
/gene="AXL2"
CDS 687..3158
/gene="AXL2"
/note="plasma membrane glycoprotein"
/codon_start=1
/function="required for axial budding pattern of S.
cerevisiae"
/product="Axl2p"
/protein_id="AAA98666.1"
/db_xref="GI:1293615"
I have already read the file and have kept each line in a list
l_cds=[] #list to hold the info under cds
for index in range(len(list_eachline)):
if list_eachline[index].startswith("CDS"):
l_cds.append(list_eachline[index])
this is extracting only the first line of the CDS region..how do modify the code ?...thanks (I am not supposed to use BioPython)
Can we assume this to be a homework question, since you are "not supposed to use BioPython"?
Its just a part of a project..and it was suggested not to use BioPython because it would become too simple. I have coded for a similar thing in Perl. And its not that I am asking for a readymade code..I have checked through the StackOverflow as well and everywhre they talk about BioPython. Hence I asked it here !