Good evening,
I'd like to compare the alignment quality of hisat2
, bowtie2
and bwa
for my files.
- The first 2 packages output the percentage of reads aligned concordantly exactly 1 time,
bwa
does not, because does not output alignment summary.
The samtools flagstat
report is not enough, because it outputs only the general level of alignment, and I want to compare the other (percentage of reads aligned concordantly exactly 1 time).
I could filter the sam
file by these reads, but I'm not sure I know how to do it correctly. I learned how to filter the sam
file after hisat2
with the command samtools view -hf 0x2 -q 3
,I have not learned how to filter the sam file after bowtie2
- none of the solutions I found worked, BUT in these cases I can at least check myself. In the case of bwa
, I can't check myself due to the lack of alignment summary.
Could you please advise how to get a fraction of reads aligned concordantly exactly 1 time, and extract them from the sam
file after bwa
(version 0.7.17)?
EDIT:
I tried samtools view -f 0x2 | grep -v -e 'XA:Z:' -e 'SA:Z:'
(based on the combination of the previous link and this), but obtained unpaired reads in the final file. XA:Z
is it is put down by bwa if there are alternative alignments. But in some cases XA:Z
exist in only one string of pair, for example:
ERR3316120.48768420 99 NODE_5_length_137336913_cov_22.55 93837250 60 24S127M = 93837315 216 GTTGAACCATGGCACCCC
TTGTTAAGGCTACCTTTTGCATGCCCAGAGATGCAAGCACCAAGTTCTGCTATCAATTTACATTGTGACAGTTTGCAGATGACTTCTGCACGCACCACGTGTCCTGGACACATGGAATTCTCTTTTGCTGGCC AA<A<FJJJF7-AFAJAJ
JJJJJJFFFJJFFJ7JJJFFFJJJF<JFFFJJAFF<FJJJJ<JFFFJJJFJFJJ--JAAJJJJFJ-7FF-AJJJJFJA-F<FFFJJJFJFJFAFFJJJJJFJFJFFAAFJJFJJFA<-FFJFF-FAAJJAFFA NM:i:0 MD:Z:127 M
C:Z:151M AS:i:127 XS:i:92
ERR3316120.48768420 147 NODE_5_length_137336913_cov_22.55 93837315 60 151M = 93837250 -216 TTTGCAGATGACTTCTGC
ACGCACCACGTGTCCTGGACACATGGAATTCTCTTTTGCTGGCCTGTATCAATGTTTTGGATCTTTTTCTACGTTCAGGCTATGTAAACATTCATTTCTAACGACAAATGAATTCCTTGAGTTACATATTTAA JFFAFAFJJJJAJFJA7<
A<FAAJAJJJJF7<<FFJF<FFJFFJ<FA-FA-FF7FAFJFF-JJJJJAAJJJJJ<JFJFJJFA<FF<7FF-<AF7AFA-JJJJJJA7FJJJJJJFF<JJJFAF-J-AJJJJFAAF<JFJFJFFFFJAFAAA- NM:i:0 MD:Z:151 M
C:Z:24S127M AS:i:151 XS:i:123 XA:Z:NODE_5_length_137336913_cov_22.55,-93864831,151M,6;
So, in the final file I have only the first string. Should I remove such alignments manually, using smth like this?
Thank you in advance!
Best regards, Poecile