Hi am trying to use ART Illumina in order to create simulated miRNA data sets. I have multiple problems in understanding the manual. I managed to create an art profile out of a fastq sample and i suppose i have to use this with a combination of a reference file in fasta format in order to have a fastq output. What i did was to run ART as follows :
art_illumina -i testmature.fa -l 10 -sp -f 10 -o simulated_SRAnew
and the output is an emply or only with header fastq file . I tried also to run :
art_illumina -i testmature.fa -1 miRNAseq_SRR1035644.txt -l 10 -sp -f 10 -o simulated_SRAnew
where miRNAseq_SRR1035644.txt is the file i created with the art_profiler but again i have an empty fastq file.
I am wondering what kind of fasta reference should i use ? My testmature.fa file is a file like that
>cel-let-7-5p MIMAT0000001 Caenorhabditis elegans let-7-5p
UGAGGUAGUAGGUUGUAUAGUU
>cel-let-7-3p MIMAT0015091 Caenorhabditis elegans let-7-3p
CUAUGCAAUUUUCUACCUUACC
>cel-lin-4-5p MIMAT0000002 Caenorhabditis elegans lin-4-5p
UCCCUGAGACCUCAAGUGUGA
>cel-lin-4-3p MIMAT0015092 Caenorhabditis elegans lin-4-3p
ACACCUGGGCUCUCCGGGUACC
>cel-miR-1-5p MIMAT0020301 Caenorhabditis elegans miR-1-5p
CAUACUUCCUUACAUGCCCAUA
>cel-miR-1-3p MIMAT0000003 Caenorhabditis elegans miR-1-3p
UGGAAUGUAAAGAAGUAUGUA
Do you have any idea where can be the problem or if there is another way to create miRNA seq simulated data ? What kind of reference and parameters to use ?
Thank you in advance
Sequencers don't use Uracil when sequencing. So you should replace those U's with T's. I assume the reference would be expected to be a genome (and not small RNA's like what you have) so that may be one reason why this is not working.
Even if i replace U's with T's the result is the same. The output of art is a fq file only with headers like that
So you suggest me to use a genome, lets say human genome and create profiles from my fq miRNA files and those 2 as input to ART ? And what about adapters ? My first problem is to be able to use ART and create a simple data set and then understand how to use it in order to create specific miRNA seq data with adapters. Those simulated data i need !
Do you absolutely need to simulate the reads? Can you not start with one of the available datasets out there? miRNA data I worked with had a specific adapter on 3'-end and then 4 bases at the beginning of the read (due to the kit used). miRNA reads where thus identified by presence of that adapter sequence (then removing it and the 4 bases at beginning of read) leaving a 22-25 bp final product. So not a straight forward thing to simulate.
I would like to create simulated miRNA reads in order to test a pipeline i already tested for 400 miRNA data sets. I found that ART is the most efficient tool to simulate reads and also there is available a new version from skewer team ( https://sourceforge.net/projects/skewer/files/Simulator/ ). In this version (art_illumina_src151-adapter-enabled.tar.gz ) they changed a bit the code in order to create miRNA with adapters but this is all i could find. There is no more documentation. What i could do was to follow the instructions of ART but as you can see above unsuccessful.