differences between bwa-mem paired and single reads
0
0
Entering edit mode
5.4 years ago
vmicrobio ▴ 290

Hi!

I would like to highlight hg19/virus ratio for Illumina paired reads. My first step was to remove hg19 reads (using -f4 samtools) and then align remaining reads against virus. I have noticed that using the same parameters I got different results for same run if I process paired reads (50% hg19, 50% virus) or single reads (3% hg19, 97% virus).

bwa mem -k 20 hg19.fasta singlereads.fastq > out.sam
bwa mem -k 20 hg19.fasta paired1.fastq paired2.fastq > out2.sam

For paired reads, I have blasted hg19 aligned reads and it occurs that human reads are actually virus.

I have noticed in cigar string that a lot of human mapped are closed to seed 20. Also, if I try to rise seed to 50 (-k 50) I got an expected ratio (3% hg19, 97% virus).

There is only limited homology between hg19 and the virus and this virus is not integrated to human genome.

Does someone can explain to me why for the same seed I haven't the same results if I give my reads single or paired??

Thanks

alignment bwa • 3.1k views
ADD COMMENT
1
Entering edit mode

paired reads (50% hg19, 50% virus) or single reads (3% hg19, 97% virus).

What are these percentage exactly ? Are they the proportion of reads mapped from the total reads ? If so, I'm surprised that these add up to 100% ; that every read that does not map to the human genome can map to the virus. Also, it could be useful to see the output of samtools flagstat on your reads mapped on human and your reads mapped on the virus.

ADD REPLY
0
Entering edit mode

thank you for your comment.

You're right, percentages are ratio between mapped and total reads, and 3 and 97% are no real values but more a global average for all samples. For one random sample : 3.03% for hg19, 96.3% for virus, so 0.67% are non mapped. Here are the outputs of samtools flagstat :

1) for paired against hg 19 (k20)

148839 + 0 in total (QC-passed reads + QC-failed reads)
0 + 0 secondary
0 + 0 supplementary
0 + 0 duplicates
0 + 0 mapped (0.00% : N/A)
148839 + 0 paired in sequencing
74311 + 0 read1
74528 + 0 read2
0 + 0 properly paired (0.00% : N/A)
0 + 0 with itself and mate mapped
0 + 0 singletons (0.00% : N/A)
0 + 0 with mate mapped to a different chr
0 + 0 with mate mapped to a different chr (mapQ>=5)

for paired (now single fastq) minus hg 19 against virus (k20)

146554 + 0 in total (QC-passed reads + QC-failed reads)
0 + 0 secondary
155 + 0 supplementary
0 + 0 duplicates
146554 + 0 mapped (100.00% : N/A)
0 + 0 paired in sequencing
0 + 0 read1
0 + 0 read2
0 + 0 properly paired (N/A : N/A)
0 + 0 with itself and mate mapped
0 + 0 singletons (N/A : N/A)
0 + 0 with mate mapped to a different chr
0 + 0 with mate mapped to a different chr (mapQ>=5)

2) for single against hg 19 (k20)

148839 + 0 in total (QC-passed reads + QC-failed reads)
0 + 0 secondary
0 + 0 supplementary
0 + 0 duplicates
0 + 0 mapped (0.00% : N/A)
148839 + 0 paired in sequencing
74311 + 0 read1
74528 + 0 read2
0 + 0 properly paired (0.00% : N/A)
0 + 0 with itself and mate mapped
0 + 0 singletons (0.00% : N/A)
0 + 0 with mate mapped to a different chr
0 + 0 with mate mapped to a different chr (mapQ>=5)

for single minus hg 19 against virus (k20)

146554 + 0 in total (QC-passed reads + QC-failed reads)
0 + 0 secondary
155 + 0 supplementary
0 + 0 duplicates
146554 + 0 mapped (100.00% : N/A)
0 + 0 paired in sequencing
0 + 0 read1
0 + 0 read2
0 + 0 properly paired (N/A : N/A)
0 + 0 with itself and mate mapped
0 + 0 singletons (N/A : N/A)
0 + 0 with mate mapped to a different chr
0 + 0 with mate mapped to a different chr (mapQ>=5)

3) for paired against hg 19 (k50)

268775 + 0 in total (QC-passed reads + QC-failed reads)
0 + 0 secondary
0 + 0 supplementary
0 + 0 duplicates
0 + 0 mapped (0.00% : N/A)
268775 + 0 paired in sequencing
134298 + 0 read1
134477 + 0 read2
0 + 0 properly paired (0.00% : N/A)
0 + 0 with itself and mate mapped
0 + 0 singletons (0.00% : N/A)
0 + 0 with mate mapped to a different chr
0 + 0 with mate mapped to a different chr (mapQ>=5)

for paired (now single fastq) minus hg 19 (k50) against virus (k20)

262536 + 0 in total (QC-passed reads + QC-failed reads)
0 + 0 secondary
598 + 0 supplementary
0 + 0 duplicates
262536 + 0 mapped (100.00% : N/A)
0 + 0 paired in sequencing
0 + 0 read1
0 + 0 read2
0 + 0 properly paired (N/A : N/A)
0 + 0 with itself and mate mapped
0 + 0 singletons (N/A : N/A)
0 + 0 with mate mapped to a different chr
0 + 0 with mate mapped to a different chr (mapQ>=5)
ADD REPLY
0
Entering edit mode

Ok this seems weird. I have two comments that might help you figure out what is going on:

  1. What you show as reads mapped on human are actually unmapped reads (0% mapped), so it doesn't tell you how much actually mapped on human. You probably need to use samtools flagstat before using the -f4 filter.
  2. There is exactly the same number of read mapped on the virus in single reads than in paired reads. So unless the total number of read change between the two (it shouldn't), it is impossible to have a 47% difference in the percentage of read mapped to the virus.
ADD REPLY

Login before adding your answer.

Traffic: 2318 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6