Entering edit mode
8.4 years ago
nchuang
▴
260
I am trying to extract a list of 5kb sequences from hg19 genome. However it takes a very long time to Bio.SeqIO.parse()
all of the genome into memory, and even using Bio.SeqIO.index()
also takes a long time as well.
What is a fast way to do this or this is a limitation of python?
I'm waiting for my admin to install pyfaidx
for me and I will see how that one does too.
You could use
samtools faidx
in the meantime: Getting Sequence Based On Chromosome No And Coordinates From Whole Genome Fasta Filewow that looks so simple. I will try to subprocess it. Thanks!
Try
pip install --user pyfaidx
. Then you should have thefaidx
script in$HOME/.local/bin
.I don't have root and pip is not even installed on the default version 2.4. They did install 3.4 for me but I had to add it to path. I don't think I have pip for that install either.