Why does the number of sequences increase after using Hi-C to mount chromosomes?
2
0
Entering edit mode
26 days ago
xinguok794 • 0

I used hifiasm to assemble a human genome using HiFi and ONT data. The initial assembly produced 159 sequences. However, after using YAHS to scaffold the chromosomes with Hi-C data, the number of sequences increased to 179 in the resulting FASTA file. This seems unusual — shouldn't the number of sequences decrease after scaffolding with Hi-C data? I'd like to understand where the issue lies. I would be grateful for any advice you could provide!

YAHS hifiasm assembly Hi-C Gene • 503 views
ADD COMMENT
0
Entering edit mode
26 days ago
Corentin ▴ 660

Hi,

It is not unusual for assemblies to have unplaced scaffolds (sequences that could not be assigned to a chromosome). You should not only check the number of sequences, but also their lengths (eg: N50, length of the largest sequence etc...). You can use Quast to compute QC stats for your assembly.

For Hi-C you can also plot the contact map and check if you have "large squares" corresponding to your chromosomes, this could give you an idea of what is happening with your assembly. You can use JuiceBox for this.

Since you are working with human data, you have access to a reference genome, you can align your assembly against a human reference genome to check for any discrepencies (using Mummer for example).

ADD COMMENT
0
Entering edit mode
25 days ago
shelkmike ★ 1.7k

YaHS not only scaffolds contigs, but also splits them in places that contradict Hi-C contacts. Maybe, this is the reason. However, in my experience YaHS always reduced the number of sequences.

ADD COMMENT

Login before adding your answer.

Traffic: 1730 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6