I'm new to the Bioinformatics field and am still learning how to use all the tools and such, but I've been stumped on something that I feel should be very easy for about a week.
Basically, I want to compare the reads in a BAM file to a FASTA reference sequence and get the changes. I can easily view them with samtools tview but I need to have it in an Excel file for manipulation/concatenation. I can't seem to figure out why mpileup won't give me a straight list of what reads are mutated, unless that's the wrong tool for the job. I'm pretty good at programming, so if the solution is something I need to code for myself I could do it but since the formats are different than what I'm used to, it's been a major headache.
Thanks
Why do you want individual reads? While you can use the C or python API to get those easily enough, it's very likely that this isn't actually what you want to do (even if you think so). Why don't you tell us the end goal that you're after.
BTW, you should almost NEVER use Excel in bioinformatics. That's usually a sure sign that you're doing something wrong.
Thanks, I guess I gravitated towards Excel because at the end I would really like easily readable columns for my non-coding mentor.
I want at the end a table like this:
Right, so basically you want a VCF file. See Pierre's answer.
what you want is a VCF file. number of reads affected/non-affected will be provided in the DP4 field. The biologists in our lab manage their VCF files with http://www.knime.org .