Hi all,
I assembled paired-end genome sequences using SOAPdenovo, and got some scaffold sequences.
I found GapCloser program which is designed to close the gaps emerging during the scaffolding process by SOAPdenovo, and ran it against my scaffold sequences.
However I found the GapCloser trimmed 'N' in scaffold sequences only. Should do I remove 'N' sequence in scaffold? Is it mandatory step?
Thanks for your comment in advance.
Are you saying GapCloser is mandatory process for PE read because PE assembly can include gap sequence?
Because most of gaps should be repeat or non/low sequenced regions, it may be in higher coverage depth or lower, if it is repeat, it will generate too many bubbles, or if it is lower coverage, it also can't be effectively to build contigs by not enough kmers, so sequence in gaps can't be constructed based on de Brujin graph algorithm. GapCloser is a amendatory tool, which allowed low confidence reads to mapped reads to the 'reference' sequence, and it masks exact repeats identified from the shotgun data, it is similar with the algorithm of "RePS" which also developed by BGI.
Thank you for your kind reply.