I have got a DNA-seq paired end dataset (fastaq). Would anyone please tell me how I should search for homologs of my gene sequence in this dataset?
I think I need to first convert fastq format file to fasta and then use blat. Would it be OK?
New comment from the user
I want to confirm that the homolog of my gene from species A (query) exists in species B genome (subject). Species A and species B are closely related species (the sequence similarity is expected to be 70% at the DNA level). Now there is no whole genome sequence available in species B. So the DNA seq reads are the only materials that I can use. Besides, will it be possible to get (or assemble) the full length of the homolog in species B?
Thanks so much. I want to confirm that the homolog of my gene from species A (query) exists in species B genome (subject). Species A and species B are closely related species (the sequence similarity is expected to be 70% at the DNA level). Now there is no whole genome sequence available in species B. So the DNA seq reads are the only materials that I can use. Besides, will it be possible to get (or assemble) the full length of the homolog in species B? Thanks a lot!
You made it more clear now. So you are looking for an ortholog. Why don't you align the fastq sequences from species B to species A and see if your gene of interest in species A has a good coverage. That will answer your first question. For your second question, as these species are only 70% identical at the DNA level, it would be hard to get the full length or exact sequence just by aligning. You don't know if the gene in species B has extra exons and extended 3' UTR that you can't just see by alignment. You may have to perform de-novo assembly and come up with the exact sequence. I am not an expert with it. hope somebody can answer it better. I will add your comments in the question.