Hi Everyone
I have been reading several benchmarking papers and I realized how much computational resources and time it must have cost for all the chromosomes we have in the human reference genome. Wouldn't it be a more principled approach to use the longest and the shortest chromosome to use in benchmarking studies of aligners/assemblers ? If not why.
If you only had sequence data from those two chromosome then yes. You could artificially retrieve the data that is aligning to just those particular chromosome and then use it (after doing an alignment to the full genome).
Keep in mind this would only be appropriate for the synthetic purpose of benchmarking.
No I have WGS data. So it covers the entire genome.
Correct. But you can pull out reads from that data that just align to two chromosomes and do what I mentioned above for benchmarking.