Pilon polish draft assembly, but it didn't work
1
1
Entering edit mode
7.0 years ago
chenyufanxin ▴ 20

Hi, I have just assemble a human genome using canu. And I mapped the Illumina data to the draft assembly using bwa mem and I got a bam file called bwa_sort.bam. Then I used pilon to polish the draft assembly , but I found the final result didn't make a change in the original sequence at all. Here is the command I'm running:

  java -Xmx60G -jar pilon-1.22.jar \
 --genome test.fasta \
 --bam $in_dir/bwa_sort.bam \
 --fix "bases" \
 --threads 15 \
 --output test \
 --outdir $out_dir

I even tried to polish single contig, but I got the same result. Here is the log of pilon:

Pilon version 1.22 Wed Mar 15 16:38:30 2017 -0400
Genome: test.fasta
Fixing snps, indels
Input genome size: 1022840
Processing tig00000488|arrow:1-779472
Processing tig00000361|arrow:1-243368
tig00000361|arrow:1-243368 log:
unpaired /home/xiatian/polish/Illumina_correct_pacbio/2_samtools_output/bwa_sort.bam: coverage 0
Total Reads: 199348, Coverage: 0, minDepth: 5
Confirmed 0 of 243368 bases (0.00%)
Corrected 0 snps; 0 ambiguous bases; corrected 0 small insertions totaling 0 bases, 0 small deletions totaling 0 bases
Finished processing tig00000361|arrow:1-243368
tig00000488|arrow:1-779472 log:
unpaired /home/xiatian/polish/Illumina_correct_pacbio/2_samtools_output/bwa_sort.bam: coverage 0
Total Reads: 765929, Coverage: 0, minDepth: 5
Confirmed 0 of 779472 bases (0.00%)
Corrected 0 snps; 0 ambiguous bases; corrected 0 small insertions totaling 0 bases, 0 small deletions totaling 0 bases
Finished processing tig00000488|arrow:1-779472
Writing updated tig00000361|arrow|pilon to /home/xiatian/polish/Illumina_correct_pacbio/4_pilon_output/test.fasta
Writing updated tig00000488|arrow|pilon to /home/xiatian/polish/Illumina_correct_pacbio/4_pilon_output/test.fasta
Mean unpaired coverage: 0
Mean total coverage: 0

It shows thar coverage is always 0, but I can't figure out why this happened. Can you help me ?

Assembly sequencing genome • 5.9k views
ADD COMMENT
0
Entering edit mode

Hi, did you checked your bam file ? You can run "samtools flagstat" on it to check the number of reads mapped to the reference.

ADD REPLY
0
Entering edit mode

I have just used haplotypecaller to call variants in the contig tig00000488|arrow , and the result contains many variants. It means that there are a lot of reads mapped to the contig indeed, opposite to the coverage 0 showed in pilon log. So , I think my bam file is ok. And I will also run "samtools flagstat" on it to check the number of reads mapped to the reference then.

ADD REPLY
0
Entering edit mode

I run "samtools flagstat" on my bam file just now. The output is as below:

2542264173 + 0 in total (QC-passed reads + QC-failed reads)
9669786 + 0 secondary
0 + 0 supplementary
0 + 0 duplicates
2535108804 + 0 mapped (99.72% : N/A)
2532594387 + 0 paired in sequencing
1264739823 + 0 read1
1267854564 + 0 read2
0 + 0 properly paired (0.00% : N/A)
2522453600 + 0 with itself and mate mapped
2985418 + 0 singletons (0.12% : N/A)
162547448 + 0 with mate mapped to a different chr
28767110 + 0 with mate mapped to a different chr (mapQ>=5)
ADD REPLY
1
Entering edit mode
7.0 years ago
chenyufanxin ▴ 20

I think I have found the reason. I changed the parameter of bwa when mapping Illumina data to the draft assembly and this time pilon worked.

The original command I used to map:

bwa mem -t 15  -M -P -R '@RG\tID:normal\tSM:normal\tLB:normalLib\tPU:runname\tCN:GenePlus\tPL:illumina' ren.arrow.fasta ${in_dir}/HAP1_264_DHG15556-S_1.fq.gz ${in_dir}/HAP1_264_DHG15556-S_2.fq.gz > ${out_dir}/bwa.sam

The new command I used to map:

bwa mem -t 15 -R '@RG\tID:normal\tSM:normal\tLB:normalLib\tPU:runname\tCN:GenePlus\tPL:illumina' ren.arrow.fasta ${in_dir}/HAP1_264_DHG15556-S_1.fq.gz ${in_dir}/HAP1_264_DHG15556-S_2.fq.gz > ${out_dir}/bwa.sam

You can see that I removed the parameter -M and -P from the command. M means marking shorter split hits as secondary (for Picard compatibility), and P means in the paired-end mode, performing SW to rescue missing hits only but do not try to find hits that fit a proper pair.

Though I have solved the problem, but I can't figure out why this happened. Any one can help me ?

ADD COMMENT
0
Entering edit mode

If you have time you can try to map with the -M option only, and then with the -P option only, to see which one of the two (or the two together) is causing the issue.

I remember that for pilon the bams need to be "sorted in coordinate order and indexed" (https://github.com/broadinstitute/pilon/wiki/Requirements-&-Usage), but I guess pilon would just throw an error instead of giving 0 coverage for unsorted bams.

ADD REPLY
0
Entering edit mode

Thank you for your reply. Following your advice , I map with the -M option only, and then with the -P option only, and I found that -P is causing the issue.

ADD REPLY

Login before adding your answer.

Traffic: 2077 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6