Question

Wgsim Output Interpretation

0

Entering edit mode

11.8 years ago

darxsys ▴ 240

I'm doing a project on which I need output from wgsim (at least I was told that). I have downloaded wgsim from it's git hub page and ran it for some bacteria genome found on NCBI's pages. Now, the problem is, I don't understand what it's output is and how to interpret it. For example, for a complete genome of length <1500 bases, I got a 4 million line text file with stuff like this (read1 and read2 files look almost the same):

@gi|379009891|ref|NC016894.1|:101-145674212561:0:01:0:00/1 GTAGAATGATCGCGACCGCCAAATTCATCACCAATTTTAGGAAGTGATAAATCAGTAATCACACGCGTGA + 2222222222222222222222222222222222222222222222222222222222222222222222 @gi|379009891|ref|NC016894.1|:101-14564409582:0:00:0:01/1 ATAATCCACTTTTTATTTATGGTGTCGTCGGTTTAGGAAAAACGCATTTAATTCAAGCCATCGGACATTA + 2222222222222222222222222222222222222222222222222222222222222222222222 @gi|379009891|ref|NC016894.1|:101-1456705261:1:02:0:02/1 AGTTTTAACACCTGGAATTTAAAAATAAAACCGATAAATTACGTCAATAATACTTACTATTTTTTATCTG +

This is from the read1 file. I'm asking this because I couldn't Google anything out and it may be useful for other people as well.

• 2.6k views

ADD COMMENT • link updated 11.8 years ago by Pierre Lindenbaum 164k • written 11.8 years ago by darxsys ▴ 240

score 1 · Answer 1 · 2013-03-13

1

Entering edit mode

11.8 years ago

Pierre Lindenbaum 164k

the output is a pair of FASTQ files.

ADD COMMENT • link 11.8 years ago by Pierre Lindenbaum 164k

1

Entering edit mode

and the small snippets of sequences that you get in these FASTQ files were generated from the reference sequence and depending on the settings may contain mutations, structural variants and errors relative to the original

ADD REPLY • link 11.8 years ago by Istvan Albert 102k