Entering edit mode
6.7 years ago
marongiu.luigi
▴
730
Dear all,
I was wondering how is built the human reference genome top-level fasta file. I thought it was a single fasta file but I realized it is actually a multifasta, but it does not only contain the sequences from all the chromosomes (which are instead single fasta), but also contains several patches and scaffold. What is the function of these 'extra' files? why are not included directly in the chromosomes files?
Thank you
Typically not all sequences can be assigned to a chromosome. These extra sequences are put into additional files. I guess these are the ones you're referring to here.
There are also alternate contigs in which the the chromosome location is known, but there is sufficient heterogeneity within the population at that location that alternate sequences were deemed necessary.