Hi All
When using this command:
java -jar ${GATK}/GenomeAnalysisTK.jar -T SplitNCigarReads -R ${hg38}.fasta -I ${WHERE}/${CURRENT}.dedupped.bam -o ${WHERE}/${CURRENT}.split.bam \
-rf ReassignOneMappingQuality -RMQF 255 -RMQT 60 -U ALLOW_N_CIGAR_READS
The output is as follows:
INFO 18:47:27,384 ProgressMeter - done 1.42375677E8
4.4 h 111.0 s 100.0% 4.4 h 0.0 s INFO 18:47:27,384 ProgressMeter - Total runtime 15909.94 secs, 265.17 min, 4.42 hours INFO 18:47:27,394 MicroScheduler - 0 reads were filtered out during the traversal out of approximately 142375677 total reads(0.00%) INFO 18:47:27,395 MicroScheduler - -> 0 reads (0.00% of
total) failing BadCigarFilter INFO 18:47:27,395 MicroScheduler -
-> 0 reads (0.00% of total) failing MalformedReadFilter INFO 18:47:27,396 MicroScheduler - -> 0 reads (0.00% of total) failing ReassignOneMappingQualityFilter
But when I do base recabliration:
java -jar ${GATK}/GenomeAnalysisTK.jar \
-T BaseRecalibrator \
-R ${hg38}.fasta \
-I ${WHERE}/${CURRENT}.split.bam \
-knownSites ${DBSNP} \
-knownSites ${GOLDINDELS} \
-o ${WHERE}/${CURRENT}.recal_data.table
I got:
INFO 12:02:24,217 BaseRecalibrator - ...done! INFO 12:02:24,218 BaseRecalibrator - BaseRecalibrator was able to recalibrate 21765617 reads INFO 12:02:24,219 ProgressMeter - done
2.1765706E7 94.9 m 4.4 m 100.0% 94.9 m 0.0 s INFO 12:02:24,220 ProgressMeter - Total runtime 5693.92 secs, 94.90 min, 1.58 hours INFO 12:02:24,220 MicroScheduler - 160165683 reads > were filtered out during the traversal out of approximately 181931389 > total reads (88.04%) INFO 12:02:24,221 MicroScheduler - -> 6632 reads (0.00% of total) failing BadCigarFilter INFO 12:02:24,221 MicroScheduler - -> 72386752 reads (39.79% of total) failing DuplicateReadFilter INFO 12:02:24,223 MicroScheduler - -> 0 reads (0.00% of total) failing FailsVendorQualityCheckFilter INFO 12:02:24,223 MicroScheduler - -> 0 reads (0.00% of total) failing MalformedReadFilter INFO 12:02:24,223 MicroScheduler - -> 0 reads (0.00% of total) failing MappingQualityUnavailableFilter INFO 12:02:24,223 MicroScheduler - -> 63827082 reads (35.08% of total) failing MappingQualityZeroFilter INFO 12:02:24,224 MicroScheduler - -> 23945217 reads (13.16% of total) failing NotPrimaryAlignmentFilter INFO 12:02:24,225 MicroScheduler - -> 0 reads (0.00% of total) failing UnmappedReadFilter
What makes >88% of reads filtered out, if they were not filtered before? and how should I fix this?
Thanks
lookls like SplitNCigarReads produces malformed reads. what's your version of gatk ?
GATK3. I think we have 0 reads (0.00% of total) failing MalformedReadFilter before and after the SplitNCigarRead.
what the major-minor version of gatk please . GATK3.* ?
I don't know how to check this Pierre Lindenbaum ?