Hi All, Recently I have been given some targeted iontorrent sequencing data to play with. It's not large amount of data only ~ 18,000 unpaired reads. I have aligned the reads with BWA, pretty much the same as I have always done with illumina fastq files. (about 80% aligned, which seemed a bit low, but whatever, I pushed on).
I then went on to mark the PCR duplicates with picard. After looking at the metrics file and then using flagstat on the resulting bam file a large portion (>70%) of the reads are duplicates. This doesn't seem quite right to me, and I was just wondering if anyone has come across this before or might have any suggestions as to what to do next. (Sure I can't use the data after removing over 70% of it, can I???)
Here is the output of flagstat before and after marking the duplicates:
>samtools flagstat sample002.s.bam
17795 + 0 in total (QC-passed reads + QC-failed reads)
0 + 0 duplicates
14258 + 0 mapped (80.12%:-nan%)
0 + 0 paired in sequencing
0 + 0 read1
0 + 0 read2
0 + 0 properly paired (-nan%:-nan%)
0 + 0 with itself and mate mapped
0 + 0 singletons (-nan%:-nan%)
0 + 0 with mate mapped to a different chr
0 + 0 with mate mapped to a different chr (mapQ>=5)
After marking with Picard
java -Xmx8g -jar MarkDuplicates.jar I=sample002.s.bam O=sample002.ds.bam M=./metrics/sample002.markdups_metrics.txt AS=true VALIDATION_STRINGENCY=LENIENT
>samtools flagstat sample002.ds.bam
17795 + 0 in total (QC-passed reads + QC-failed reads)
13064 + 0 duplicates
14258 + 0 mapped (80.12%:-nan%)
0 + 0 paired in sequencing
0 + 0 read1
0 + 0 read2
0 + 0 properly paired (-nan%:-nan%)
0 + 0 with itself and mate mapped
0 + 0 singletons (-nan%:-nan%)
0 + 0 with mate mapped to a different chr
0 + 0 with mate mapped to a different chr (mapQ>=5)
Cheers, Davy
Not used ion torrent myself but would be curious to see what the fastqc report looked like (quality of data). 80% may be due to poor data and may need some trimming to map more, although BWA-mem trims automatically. Which BWA algorithm have you used?
I used the standard bwa aln in version 0.6.2. The fastQC reports showed the tails of the reads to quite low quality, so I will try BWA-mem to see if the alignment quality improves.
Cheers.