Entering edit mode
11.9 years ago
Pappu
★
2.1k
I want to parse noncoding DNA sequences around a protein (P08707) from an embl file: http://www.ebi.ac.uk/ena/data/view/U32222&display=txt&expanded=true
I could grep '^ CDS' and then figure out the non coding regions and compare to the location of the target protein in python. I am wondering if there are any smarter way of doing it. Thanks.
If you are just pulling one sequence why not just copy and paste from the link? Just look for 22433..23011 in the genomic sequence. Or you could pull the fasta and use a subsequence program: http://code.google.com/p/biopieces/wiki/extract_seq