I know this is a very basic question, but I've been having difficulties discovering how to analyze my mpileup data.
Because I feel that I don't know how to analyze it, I am questioning whether I understand what it is. So this is a simple question, but is necessary:
What is contained in mpileup data? According to samtools man "In the pileup format (without -uor-g), each line represents a genomic position, consisting of chromosome name, coordinate, reference base, read bases, read qualities and alignment mapping qualities."
But I fail to see where my SNP information is. I am simply looking to determine the location of SNPs that were part of the input that was run through bowtie.
If I'm not being clear, please let me know and I will add more details on what I am specifically trying to do.
Thanks in advance! -SortingHat
EDIT: So it seems that if I run pileup (I know it is depricated) that my SNPs come through perfectly, but when I run mpileup I get a ton of garbage data which has made this whole process very difficult. How can I cut down mpileup to reflect what pileup represents?
I used this: samtools mpileup inflA.sorted.bam iflA.sorted.mpileup
I assumed inflA.sorted.bam was my consensus sequence.
Thanks for the link, I'll definitely run through that information.
Do you have any suggestions for post processing? Thanks again! -Matt