I have to generate a "toy dataset" with one human chromosome and short reads mapping on these contigs.
Which one would you pick? I would go for chromosome 21 since: 1) it is short, so less data 2) it has no gender bias (depth) like X/Y chromosome.
Is this a good pick or would you advise to go for another one?
Are there pre-processed datasets anywhere with contigs and corresponding reads filtered for just the specific chromosome?
Sure, forgot to state the goal. It is more about mapping short reads and not about genes. So this is not a problem.
If it is about generating a realistic setting, think about also reducing your reads to those mapping to this chromosome, or to simulate them. Otherwise, you would get an unreasonably large proportion of unaligned reads.
Sure, I will filter the reads.