I have one gene from a reference genome in .txt format, and one query genome in .fastq format. I need to search the query genome, a rice genome (n=7, 430Mbp), for a sequence homologous to the reference gene.
So far, I have tried loading both sequences to strings in python and aligning them using pairwise2 in biopython. After loading the query genome to a string object, the pairwise2 alignment function raises a memory error. I'm sure there are smarter ways to deal with the large .fastq query genome, which is why I've come here.
How do I find a homolog in a large .fastq file? I'm happy to use R or Python or whichever platform works best.
Why don't you use BLAST? You can install and run it locally.
You could use mummer for global alignment.