Question

Shred a genome to short reads

0

Entering edit mode

8.6 years ago

merwright • 0

I'm working on assemblies and wanted to create a mock metagenome assembly. My research is using biochem approaches to enrich eukaryotic DNA, followed by enriching computationally to get the genome.

Question: I would like to take a small, complete eukaryotic genome (approximately 20-40Mb), along with several complete bacterial genomes (~5Mb), and shred these all into 100-250bp (random) fragments.

That means each genome would be shredded 10-20 times randomly and independently so overlaps are available. All separate files would be merged into one mock fasta file simulating a NGS library that has been cleaned and ready for assembly.

I've tried searching for "genome shredding" and other derivatives for several weeks. Can anyone suggest software that would have this partially done, or some kind of framework for me to code this? This is the process I have thought of so far:

Input file is one line of complete, assembled genome
Each shuffle is composed of selecting a number between 100-250, taking that number of nucleotides and writing into new file with a fasta format of:

>random1

>ATATATATA (sequence)

>random2

>GCGCGCG (sequence)

10 separate fasta files of each organism are all cat > mocksequencing.fasta

I feel like this isn't too complicated or out of the norm for a lot of studies, so writing this myself is a bit redundant. Is this somewhere in BioPython Documentation? Thanks!

genome • 2.2k views

ADD COMMENT • link updated 8.6 years ago by leekaiinthesky ▴ 180 • written 8.6 years ago by merwright • 0

score 0 · Answer 1 · 2016-04-27

0

Entering edit mode

8.6 years ago

leekaiinthesky ▴ 180

I believe the term you're looking for is read simulator.

Take a look at Metasim. wgsim may also be of interest.

And a relevant Biostars post that includes other suggestions: What Ngs Read Simulators Are Available For Paired-End Data?.

ADD COMMENT • link 8.6 years ago by leekaiinthesky ▴ 180

0

Entering edit mode

Thank you! Knowing what these are commonly referred to is a huge help. I will look into that.

ADD REPLY • link 8.6 years ago by merwright • 0