Hello,
I have a bam file with 8 samples. I am trying to use this file to call SNP's with some threshold values for minimum coverage, base quality, allele frequency etc. I found Varscan tool will let me call SNP's using mpileup file from samtools. http://varscan.sourceforge.net/using-varscan.html#v2.3_mpileup2snp I generated my mpileup file using the command - samtools mpileup -f [reference sequence] input_bam_file >myData.mpileup
I am using hg19 as reference. In my mpileup file, all my reference bases(third column) showed up as 'N' and the file only has data for chr1(column 1). Also the 4th column(reads covering the site) is 1/0 for the entire file.
chr1 16451 N 0
chr1 16452 N 1 A /
chr1 16453 N 1 A 1
chr1 16454 N 1 A .
chr1 16455 N 1 C 4
chr1 16456 N 0
chr1 16457 N 1 A /
chr1 16458 N 1 C 8
chr1 16459 N 1 T 3
Any thoughts as to why mpileup file looks like that?
Thank you! Tejaswi
your 8 samples have been mapped on the very same reference ?
Yes, using standard hg19
Check whether
samtools view -H <your bam file>
returns exactly the same names for chromosomes asgrep ">" <your reference fasta>
, had that happen to me a few times...Hey Philipp,
Thanks for responding. Yeah I checked for the chromosome names and this is how it looks -
BAM file - @HD VN:1.0 GO:none SO:coordinate
@SQ SN:chr1 LN:249250621
@SQ SN:chr2 LN:243199373
@SQ SN:chr3 LN:198022430
@SQ SN:chr4 LN:191154276
@SQ SN:chr5 LN:180915260
@SQ SN:chr6 LN:171115067
@SQ SN:chr7 LN:159138663
@SQ SN:chr8 LN:146364022
@SQ SN:chr9 LN:141213431
@SQ SN:chr10 LN:135534747
@SQ SN:chr11 LN:135006516
@SQ SN:chr12 LN:133851895
@SQ SN:chr13 LN:115169878
@SQ SN:chr14 LN:107349540
@SQ SN:chr15 LN:102531392
@SQ SN:chr16 LN:90354753
@SQ SN:chr17 LN:81195210
@SQ SN:chr18 LN:78077248
@SQ SN:chr19 LN:59128983
@SQ SN:chr20 LN:63025520
@SQ SN:chr21 LN:48129895
@SQ SN:chr22 LN:51304566
@SQ SN:chrX LN:155270560
@SQ SN:chrY LN:59373566
@SQ SN:chrM LN:16571
Ref.fa file -
Should I change the chromosome names in my ref.fa file as >chr1, >chr2 etc.?
I just noticed that my bam file has chrM(mitochondrial) too. Maybe I should concatenate the fasta file of chrM to my reference file as well?
Thank you for your help!
I think there's your problem -
gi|224384768|gb|CM000663.1| Homo sapiens chromosome 1, GRC primary reference assembly
should bechr1
etc.. mpileup looks forchr1
, can't find it, and just inserts Ns. And yes, concatenate your mitochondrial reference as well!So, it wasn't the very same reference ....
Thank you Philipp! That worked.
I din't realize the chromosome names were different. Thanks for your help Pierre! :)