Entering edit mode
4.8 years ago
arjun.murthy779
▴
10
For SNP Analysis, you need to create an alignment of two files. Generally you use a .fa and .fq files. How would you carry out alignment using two .fa files (Reference genomes from NCBI)? Bowtie2 and Bwa tools are not carrying out the alignment correctly.
not sure where you're going with this? Do you want a whole genome DNA alignment or such?
For SNP analysis you usually align resequencing data to a reference genome (not 2 genomes to each other)
Alright, let me try to put it in better words. I have downloaded a bunch of reference genomes of a specific bacterium (say 100), which is spread over 10 years. I have the oldest submitted sequence (say 2003). Now, I am trying to align the 2003 sequence with another reference sequence (say submitted date is 2007). The end goal is to identify SNP's which may have occured in those 4 years. Both of these files are Reference Genome files in the .fa format. Now I know that normally SNP analysis is carried out using shorter DNA reads, but is it possible to do the same with two reference genomes?
ok, makes sense indeed.
apart from what Mensur Dlakic points out, you could split all your genomes in chunks of ~500bases or such and treat them as short reads and then align them to the reference using any of the commonly used read mappers
Alternate would be to try converting fasta to fastq using scripts like: https://github.com/ekg/fasta-to-fastq/blob/master/fasta_to_fastq.pl which fills the scores with dummy scores. see if that works.