Question

Generating random DNA sequence and paired-end alignment

1

Entering edit mode

8.3 years ago

ThePresident ▴ 180

Hello Biostars community,

I would like to simulate structural variants calling (i.e. genomic inversions, deletions and insertions) in order to understand some experimental results I am getting.

My idea is:

Generate random DNA sequence of defined length (ex. 1 Mbp) with equal probability of A/T/C/G and store as fasta1
Manually create genomic inversion/deletion/insertion/duplication etc. and store as fasta2
Tricky part: Use the sequence from fasta2 and generate random paired-end data with fastq format (thus generating random but unique header, sequence of defined length derived from fasta2 with highest quality). These paired-end "reads" would also need to have a defined insert length (let's say 500bp with some standard deviation).

Since my knowledge in coding is basic-next-to-nothing, I am not sure if this is actually possible and have no idea if I should use R, Python or...? Any help or existing scripts would be highly appreciated.

Thank you in advance.

R Python Simulation • 3.2k views

ADD COMMENT • link updated 6.3 years ago by Johan Zicola ▴ 70 • written 8.3 years ago by ThePresident ▴ 180

score 2 · Answer 1 · 2016-08-16

2

Entering edit mode

8.3 years ago

GenoMax 147k

you could grab a bacterial genome from GenBank.
You are going to do this manually
randomreads.sh from BBMap. Guide thread.

ADD COMMENT • link 8.3 years ago by GenoMax 147k

0

Entering edit mode

I am already dealing with bacterial genomes. They frequently have inversions/duplications etc. so I want to generate a random sequence which will (hopefully) be free of such structures. Thanks for No.3 :)

ADD REPLY • link 8.3 years ago by ThePresident ▴ 180

1

Entering edit mode

If you need random sequence then use: Generate Random Dna Sequence Data With Equal Base Frequencies

Or two online sites:

http://users-birc.au.dk/biopv/php/fabox/random_sequence_generator.php
http://www.faculty.ucr.edu/~mmaduro/random.htm

ADD REPLY • link 8.3 years ago by GenoMax 147k

0

Entering edit mode

Everything works well, I didn't think I would pull this easily. All tools are already there :) BTW, any chance there is an automated generator of inversions/duplications and such (point 2)? I am doing it manually, and it's a little bit time consuming.

ADD REPLY • link 8.3 years ago by ThePresident ▴ 180

1

Entering edit mode

BBMap has a recent addition called mutate.sh, that I made for testing the sensitivity of contaminant removal when the contaminants are bacterial strains of the same species. It creates a mutant variant of a genome. For example:

mutate.sh in=ecoli.fasta out=mutant.fasta id=0.95

This will create a mutant version of the original genome with 95% identity to the original. The mutations are random, with no conserved locations (though I may add that option later), so any duplications or inversions in the original will (probabilistically) not be present in the mutant, since they would have received different mutations. However, the general structure will still be similar to a real bacteria. If you want to generate synthetic reads from a bacteria-like thing with no repeats or inversions, I suggest you run mutate on a real bacterial genome, then use randomreads.sh on the mutant genome. 95% identity should be sufficiently low (averaging a mutation every 20bp), though it depends on your specific needs.

ADD REPLY • link 8.3 years ago by Brian Bushnell 20k

score 1 · Answer 2 · 2018-07-22

1

Entering edit mode

6.3 years ago

Johan Zicola ▴ 70

I wrote a python script with the different functions you would need to test structural variation calling on either randomly generated fastq files or fastq files generated based on a given specified fasta file. Find the script and documentation on https://github.com/johanzi/fastq_generator

ADD COMMENT • link 6.3 years ago by Johan Zicola ▴ 70

score 0 · Answer 3 · 2016-08-16

0

Entering edit mode

8.3 years ago

Aerval ▴ 290

A review on various tools: http://www.nature.com/doifinder/10.1038/nrg.2016.57

ADD COMMENT • link 8.3 years ago by Aerval ▴ 290

0

Entering edit mode

superawesome! Thanks

ADD REPLY • link 8.3 years ago by ThePresident ▴ 180