Hey,
is there a way to make all bases that are from the same read in one vertical line using mpileup? So making this
chr2 96 C 5 .,.,,
chr2 97 A 4 .,.,
chr2 98 C 6 .,.,,,
chr2 99 C 8 .,.,,,..
chr2 00 A 9 .,,,..,,.
chr2 01 C 5 ..,,.
chr2 02 C 5 ,.,,,
chr2 03 T 4 .,,,
to this (made up example)
chr2 96 C 5 .,.,,
chr2 97 A 4 .,. ,
chr2 98 C 6 .,.,, ,
chr2 99 C 8 .,.,,,..
chr2 00 A 9 .,,,..,,.
chr2 01 C 5 .., ,.
chr2 02 C 5 ,., ,,
chr2 03 T 4 .,, ,
To make it look a little bit like in IGV? I know this would be impossible for large regions, but I am just looking at very tiny region up to 20 or max 30 bp.
EDIT: To make it more clear, let's assume this is our reference sequence:
CGATGCTAGC
And these are our NGS reads:
CGATG
GATGCT
AXGCX
GCTAG
TAGC
These should be mapped like this
CGATGCTAGC
CGATG
GATGCT
AXGCX
GCTAG
TAGC
and default mpileup would look like this:
1 C .
2 G ..
3 A ...
4 T ..X
5 G ....
6 C ...
7 T .X..
8 A ..
9 G ..
10 C .
Here you cannot see that the both mismatches (X) are from the same read, so that's why I want to make it look like this:
1 C .
2 G ..
3 A ...
4 T ..X
5 G ....
6 C ...
7 T .X..
8 A ..
9 G ..
10 C .
Here you can see that the both mismatches are from the same read, because they are in the same possition in the mpileup string.
Can you explain the changes you've made in your made up example? I cannot understand your requirement.
I’ll try :). I’ll introduced some spaces so that the bases that belong to the same read are always in the same position in the mpileup string, if a read does not contain this position this would be space. I want to get an overview, of the general sequencing quality in one position so if there’s a rare mutation I can check, if the whole read is bad or, if this is really likely a mutation with very low AF. I’m on my mobile right now and I will try to make a figure to illustrate this better.
Ok, I added some text and I hope that makes it more clear :).
Thank you, your requirement is clear now. I don't know much in this domain, but I have a feeling that you may have to write a custom script, which is not a big deal. You'll merely need to create a matrix from the read-based pileup, replace matches with dots and mismatches with
X
/,
/whatever, then transpose the matrix. I'll try and create an R example.EDIT: The script is more challenging that I thought it would be, especially given the fact that I start with the alignment you show in your example and not your actual starting point. Please wait for others to weigh in, they may have better options for you.