Entering edit mode
5.7 years ago
little_more
▴
70
I have a list of assemblies IDs (GenBank) and a list of corresponding chromosomes and plasmids. A toy example:
a = [GCA_000005845.2, GCA_000006925.2, GCA_000007405.1, GCA_000007445.1, ...]
b = [CP024720.1, CP024722.1, CP024721.1, LT601384.1, LT838196.1,...]
I'd like to find a number of IS in each genome. The only thing that have come to my mind: parse all CDS in each genome with BioPython and count the number of CDS with "IS ... transposase" in their "product" keys. Is there a better way to do this? Can I somehow use GO? Note that the lists are quite big so I need an automated way.
I' had to google this.
"TE" : Transposable Element
what you mean by "IS"? Before to perform any bioinformatic analysis I would recommend you answer these questions?
'IS' is a pretty common abbreviation for "insertion sequence" -- a type of mobile elements in prokaryotic genomes. I do not know what is the purpose of your comment because I am asking exactly "is there a way..." and one of the ways I've already tried.