Hello, I am trying to work with metagenome Illumina simulated reads to test a pipeline.
The normal way to do that is to use simulator tools like ART.
Due to time, I am thinking of just downloading some SRA projects (sequences by Illumina), every SRA project originated from an organism, and then mixing the sequences of my SRA to produce the simulated metagenome Illumina reads.
I want to listen to your feedback, does this work as a true simulated metagenome read? The SRA files can be contaminated with other species, so I will don't truly the composition of my simulated dataset.
Thanks in advance
Have a good day!
If you combine independent datasets then all you're "simulating" is batch effects between the datasets rather than a metagenomic dataset. I am not a metagenome person so cannot recommend specific tools, but I can tell with confidence that combining datasets is not going to be meaningful.