Entering edit mode
9.3 years ago
quachtina96
▴
40
Hi,
I am using samtools mpileup to generate a pileup file, and my output doesn't match the example pileup files I've seen (e.g. the example on the pileup format article on Wikipedia). Do you agree that there seems to be something wrong with my file? If so, any suggestions on how to fix it?
The command I used:
samtools mpileup -B -f chrRCRS.fa input-sorted.bam > input.pileup
First part of the pileup file.
chrRCRS 1 N 22 ^]G^]G^]G^]G^]G^]G^]G^]G^]G^]G^]G^]G^]G^]G^]G^]g^]g^]g^]g^]g^]g^]g IIBFFFIIIIIFFFIBIIIFII
chrRCRS 2 N 22 AAAAAAAAAAAAAAAaaaaaaa FIBFFFIIIIIFFFIBIIIFII
chrRCRS 3 N 22 TTTTTTTTTTTTTTTttttttt IIFFFFIIIIIFFFIFIIIFII
chrRCRS 4 N 22 CCCCCCCCCCCCCCCccccccc IIFBFFIIIIIFFFIFFIIFII
chrRCRS 5 N 23 AAAAAAAAAAAAAAAaaaaaaa^]A IIFFFIIIIIIFFFIBIIIFFIB
chrRCRS 6 N 24 CCCCCCCCCCCCCCCcccccccC^]c IIFBFIIIIIIFBFIFIIIFIIBB
chrRCRS 7 N 25 AAAAAAAAAAAAAAAaaaaaaaAa^]a FIF7FIIIIIIFFFIFIIIFIIBFF
chrRCRS 8 N 25 GGGGGGGGGGGGGGGgggggggGgg FIF0FIIIFIIIFFIFIIIFIIFBB
chrRCRS 9 N 25 GGGGGGGGGGGGGGGgggggggGgg FIFBFIIIIIIIFFIFIIIFIIFBB
chrRCRS 10 N 26 TTTTTTTTTTTTTTTtttttttTtt^]T FFF0FFFFFFFIBFFFIIIFIIFBBB
chrRCRS 11 N 26 CCCCCCCCCCCCCCCcccccccCccC FIFBIIFFIIFIFIIFIIIFIIFBBB
chrRCRS 12 N 27 TTTTTTTTTTTTTTTtttttttTttT^]T IIF<IIIIIIIIFIIFFIIFIIFBFBB
chrRCRS 13 N 27 AAAAAAAAAAAAAAAaaaaaaaAaaAA FIIBIIIIFIIIIIIBIIFFIIFBBBB
chrRCRS 14 N 27 TTTTTTTTTTTTTTTtttttttTttTT IIIBIIIIIIIIFII7FIFBIIFB<FB
chrRCRS 15 N 27 CCCCCCCCCCCCCCCcccccccCccCC IIIBIIIIIIIIFIIFFFFBFFFB<FF
chrRCRS 16 N 27 AAAAAAAAAAAAAAAaaaaaaaAaaAA IIIBIIIIIIIIFII<BBBBBFF<7FF
chrRCRS 17 N 27 CCCCCCCCCCCCCCCcccccccCccCC IIIBIIIIIIIIFIIFIIIFBIF7FFF
chrRCRS 18 N 28 CCCCCCCCCCCCCCCcccccccCccCC^]C IIIBIIIIIIIIFIIFIIIF0IIFFBFB
I am not sure if it is a problem or unknown bases in the reference genome. The reference nucleotide at all positions (third column) that you have listed above is "N". As a result, bases at those positions from aligned reads show mismatch. For example, for position 2, aligned reads have "A" nucleotide. "A" represents mismatch on the forward strand and "a" represents mismatch on the reverse strand. If you keep scrolling down in the mpileup file, you may find non N's in the reference genome and lots of "." and "," for those positions provided that the non-reference sample has no variants for those positions. In case, all you see in the third column is "N"s then there may be some compatibility issue between fasta file and bam file. Try:
1) The fai file is one kb and it only contains
2) If I'm interpreting this correctly, the chromosome name is the same in the header (See below)