I have a multi-fasta file containing sequences with headers such as:
>AAST01014508.1|(1..6240)|LTR/Pao|ROOA_I-int:ROOA_LTR
>AAST01026747.1|c(2745..6820)|LTR/Pao|ROO_I-int
In the second section of the header where the position is given I want to index all of the sequences using this and print to file all those sequence from the C strand, e.g. >AAST01026747.1|c(2745..6820)|LTR/Pao|ROO_I-int
I was able to create a function easily which did this:
def get_compstrandTE(record):
parts=record.id.split("|")
assert len(parts) ==4
return parts[1]
However, I am now stuck as to how to search through the keys in the dictionary and find only those containing the 'c' and write those to a file. I tried using the example from the BioPython manual but kept running into difficulties.
If anyone has any suggestions I would really appreciate it.
Wonderful, thank again!