Entering edit mode
7.8 years ago
ammarsabir15
▴
70
I want to align fasta sequences for 28 exons of SCN5A gene with fastq sequence of whole exome sequence from 1000 genomes project.
By aligning these two I want to find mutations within the exons of SCN5A gene.
Should I use conventional NGS alignment methods for this purpose i.e make indices of fasta and then align fasta with fastq or there is any other strategy for this purpose?
Irrespective of the method you settle on keep in mind that when you align data to an abbreviated version of a genome (in this case just SCN5A gene) there is a chance that aligners will align reads that may not have originally belonged in that region. If that is a cause for concern then aligning to the whole genome/exome followed by retrieving regions of alignment of interest may be a safer option.
and in your case, if any short-read comes from SCN10A which is homologous to SCN5A, it might produces a false positive when only mapped to SCN5A...
Ok I will try to use this but lets say SCN5A is on chromosome 3 then how to get exons only for chromosome 3 from this whole exome sequence ?
You could align to the whole genome/exome and then extract regions you need later: Extracting reads from multiple regions
After alignment I got BAM file. What should be the next step to get the sequence of chromosome # where my desired gene is present? Should I extract regions from BAM file based on chromosome using samtools view and then convert that into fastq? Or Should I convert the BAM file to BED file and then extract required regions ?
Things may be a bit tricky depending on what you want.
Are you looking to get a consensus? Take a look at this. If you only want to restrict to a chromosome or only the SCN5A gene then you could limit the region that you feed to
samtools mpileup
by usingsamtools view sorted.bam chr3:1-450000
(use the interval you want)