How To Search For Putative Homologs Of A Protein In Genome Sequencing Raw Data?
1
3
Entering edit mode
11.1 years ago
sanchezcavani ▴ 220

I have got a DNA-seq paired end dataset (fastaq). Would anyone please tell me how I should search for homologs of my gene sequence in this dataset?

I think I need to first convert fastq format file to fasta and then use blat. Would it be OK?

New comment from the user

I want to confirm that the homolog of my gene from species A (query) exists in species B genome (subject). Species A and species B are closely related species (the sequence similarity is expected to be 70% at the DNA level). Now there is no whole genome sequence available in species B. So the DNA seq reads are the only materials that I can use. Besides, will it be possible to get (or assemble) the full length of the homolog in species B?

• 6.1k views
ADD COMMENT
3
Entering edit mode
11.1 years ago

If by homolog you mean sequences with high sequence identity then what you have suggested seems fine. Convert fastq to fasta and then create a blat database. But remember that fastq sequences may be only 75-100 bp and lot smaller than your gene of interest. So the first problem is that if you blat your gene against your paired-end fastq blat database you will get lot of reads in return (Reads aligning at different parts of gene). Second problem is that your blast database may be huge (I dont know how big your fastq dataset is) and it may require lot of computational resource to do blat against it. If you can be more clear what is your ultimate goal or why you want to do this particular task I may help you in a much better way.

ADD COMMENT
0
Entering edit mode

Thanks so much. I want to confirm that the homolog of my gene from species A (query) exists in species B genome (subject). Species A and species B are closely related species (the sequence similarity is expected to be 70% at the DNA level). Now there is no whole genome sequence available in species B. So the DNA seq reads are the only materials that I can use. Besides, will it be possible to get (or assemble) the full length of the homolog in species B? Thanks a lot!

ADD REPLY
0
Entering edit mode

You made it more clear now. So you are looking for an ortholog. Why don't you align the fastq sequences from species B to species A and see if your gene of interest in species A has a good coverage. That will answer your first question. For your second question, as these species are only 70% identical at the DNA level, it would be hard to get the full length or exact sequence just by aligning. You don't know if the gene in species B has extra exons and extended 3' UTR that you can't just see by alignment. You may have to perform de-novo assembly and come up with the exact sequence. I am not an expert with it. hope somebody can answer it better. I will add your comments in the question.

ADD REPLY

Login before adding your answer.

Traffic: 2259 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6