hello, i have a question when i want to extract cds sequence using gene id. but cds file is not just start with >gene is, it has many other annotation. the only same is star with gene:
cds fasta:
>Zm002 cds gene:Zm1d035916 gene_biotype:protein_coding
ATCGGCAT
>Zm001 cds RefGen_v4:9:153880862:153883850:-1 gene:Zm1d048 gene_biotype:protein_coding
ATGCGGCA
gene_list
Zm1d035916
Zm1d048
how to get result like
>Zm1d035916
ATCGGCAT
>Zm1d048
ATGCGGCA
I think biopython might help : http://biopython.org/DIST/docs/tutorial/Tutorial.html Refer to section 2.4.1 :
I think it is very straightforward, once you get the seq_record.id then you can slice the specific substring from the seq_record.id by using str.find (for example) in python