I'm doing a project on which I need output from wgsim (at least I was told that). I have downloaded wgsim from it's git hub page and ran it for some bacteria genome found on NCBI's pages. Now, the problem is, I don't understand what it's output is and how to interpret it. For example, for a complete genome of length <1500 bases, I got a 4 million line text file with stuff like this (read1 and read2 files look almost the same):
@gi|379009891|ref|NC016894.1|:101-145674212561:0:01:0:00/1 GTAGAATGATCGCGACCGCCAAATTCATCACCAATTTTAGGAAGTGATAAATCAGTAATCACACGCGTGA + 2222222222222222222222222222222222222222222222222222222222222222222222 @gi|379009891|ref|NC016894.1|:101-14564409582:0:00:0:01/1 ATAATCCACTTTTTATTTATGGTGTCGTCGGTTTAGGAAAAACGCATTTAATTCAAGCCATCGGACATTA + 2222222222222222222222222222222222222222222222222222222222222222222222 @gi|379009891|ref|NC016894.1|:101-1456705261:1:02:0:02/1 AGTTTTAACACCTGGAATTTAAAAATAAAACCGATAAATTACGTCAATAATACTTACTATTTTTTATCTG +
This is from the read1 file. I'm asking this because I couldn't Google anything out and it may be useful for other people as well.
and the small snippets of sequences that you get in these FASTQ files were generated from the reference sequence and depending on the settings may contain mutations, structural variants and errors relative to the original