Dear all,
I am facing an issue with samtools tview command. When I try to scroll through the alignment in terminal, all the bases are appearing as "N"s. At first I thought it had something to do with the bam file or reference fasta files. But then I saw that when I increase or decrease the size of terminal window and then run samtools tview, the positions where these "N"s begin also changes. Does anybody has this problem ? are there any suggestions ?
In both cases samtools tview will still show you the alignment/pileup, however, all reference bases will be Ns. In other words, make sure you use the reference fasta as last argument and make sure it exists. No need to worry about indexing your fasta file. If the index is missing, it will be computed automatically.
If things still fail, something really strange is going on.
I would just add that sometimes you do have to worry about the index file, because if it is there, but for some reason is faulty, most software won't tell you that, it will just go ahead and try to make use of it anyway, leading to errors. So if you know that something is off about how software is interacting with your reference fasta, remaking the index with samtools faidx is a pretty quick, easy thing to try as a first troubleshooting step.
I would go ahead and reindex your fasta file. I have solved the "N" problem that way. Check a position in the alignment in a region that you know doesn't contain "N"s in the reference. Use the 'g' key to quickly move positions.
I just ran into this issue. My problem was that the reference Fasta file contained a different name for the chromosome than the BAM file indicated. To fix it, I changed the Fasta header to match the one in the BAM file.
In my case, samtools view -H in.bam | grep '@SQ' showed the name I should use (after the SN: part).
Don't forget to rerun samtools faidx ref.fasta after changing the Fasta file!
I think the OP had a different issue; I encountered the same and it is not a major problem.
Description
1) Some (but not all) the bases of the reference (known to be different from N) are shown as N.
2) The position at which the bases start appearing as N varies when display size varie
3) (At least for me) The problem arises only when no reads are aligning on that portion of the reference. It never happened (to me) to encounter this issue when reads are aligning. This was apparent to me in some instances in which I had the beginning of the reference without reads but showing the correct nucleotides, then some Ns, then, after 300bp I started having reads aligning to the reference and again the reference started showing correctly the 4 nucleotides.
4) Finally I noticed that the beginning of the reference is correctly shown up to what fits in the display. Scroll one base to the left and that will be an N.
Explanation
I do not know if this is a bug or is intended (maybe we don't care to see the whole reference when no reads are aligning on it), but it is not a major problem, since only affects regions in which you are not aligning.
I also have this problem.
I tested a simulated small genome and used wgsim to generated a pair end reads file, and it works using samtools tview, but it didn't work when I changed to HG19, Does samtools tview an only handle small genome?
Can you see the reference bases before you change the size of the window? Do you see any non-N reference bases at all?