Hi,
Given a genome with many unordered contigs, and some external information that can be used to anchor these to chromosomes/linkage groups, is there a standard file format for specifying the linkage relationships between contigs? Downstream analyses will rely on this order, for example, window-based calculations of popgen summary statistics. For example, I can map the set of linkage markers to the reference using a short-read aligner, and determine that a certain set of contigs belong to linkage group X, and are in a particular order. Should this simply be represented in a fasta file with linkage information encoded in the header?
Thanks!
One commonly used method is to link contigs into scaffolds by an arbitrary number of Ns - for example, 10 Ns:
ACGTNNNNNNNNNNACGT
.