Question

Use ART illumina to create miRNA simulated data

0

Entering edit mode

6.1 years ago

dzisis1986 ▴ 70

Hi am trying to use ART Illumina in order to create simulated miRNA data sets. I have multiple problems in understanding the manual. I managed to create an art profile out of a fastq sample and i suppose i have to use this with a combination of a reference file in fasta format in order to have a fastq output. What i did was to run ART as follows :

art_illumina -i testmature.fa -l 10 -sp -f 10 -o simulated_SRAnew

and the output is an emply or only with header fastq file . I tried also to run :

art_illumina -i testmature.fa -1 miRNAseq_SRR1035644.txt -l 10 -sp -f 10 -o simulated_SRAnew

where miRNAseq_SRR1035644.txt is the file i created with the art_profiler but again i have an empty fastq file.

I am wondering what kind of fasta reference should i use ? My testmature.fa file is a file like that

>cel-let-7-5p MIMAT0000001 Caenorhabditis elegans let-7-5p
UGAGGUAGUAGGUUGUAUAGUU
>cel-let-7-3p MIMAT0015091 Caenorhabditis elegans let-7-3p
CUAUGCAAUUUUCUACCUUACC
>cel-lin-4-5p MIMAT0000002 Caenorhabditis elegans lin-4-5p
UCCCUGAGACCUCAAGUGUGA
>cel-lin-4-3p MIMAT0015092 Caenorhabditis elegans lin-4-3p
ACACCUGGGCUCUCCGGGUACC
>cel-miR-1-5p MIMAT0020301 Caenorhabditis elegans miR-1-5p
CAUACUUCCUUACAUGCCCAUA
>cel-miR-1-3p MIMAT0000003 Caenorhabditis elegans miR-1-3p
UGGAAUGUAAAGAAGUAUGUA

Do you have any idea where can be the problem or if there is another way to create miRNA seq simulated data ? What kind of reference and parameters to use ?

Thank you in advance

miRNA art_illumina simulate • 1.8k views

ADD COMMENT • link 6.1 years ago by dzisis1986 ▴ 70

0

Entering edit mode

Sequencers don't use Uracil when sequencing. So you should replace those U's with T's. I assume the reference would be expected to be a genome (and not small RNA's like what you have) so that may be one reason why this is not working.

ADD REPLY • link 6.1 years ago by GenoMax 147k

0

Entering edit mode

Even if i replace U's with T's the result is the same. The output of art is a fq file only with headers like that

@cel-let-7-5p-10

+

@cel-let-7-5p-9

+

@cel-let-7-5p-8

+

@cel-let-7-5p-7

+

@cel-let-7-5p-6

+

ADD REPLY • link 6.1 years ago by dzisis1986 ▴ 70

0

Entering edit mode

So you suggest me to use a genome, lets say human genome and create profiles from my fq miRNA files and those 2 as input to ART ? And what about adapters ? My first problem is to be able to use ART and create a simple data set and then understand how to use it in order to create specific miRNA seq data with adapters. Those simulated data i need !

ADD REPLY • link 6.1 years ago by dzisis1986 ▴ 70

0

Entering edit mode

Do you absolutely need to simulate the reads? Can you not start with one of the available datasets out there? miRNA data I worked with had a specific adapter on 3'-end and then 4 bases at the beginning of the read (due to the kit used). miRNA reads where thus identified by presence of that adapter sequence (then removing it and the 4 bases at beginning of read) leaving a 22-25 bp final product. So not a straight forward thing to simulate.

ADD REPLY • link 6.1 years ago by GenoMax 147k

0

Entering edit mode

I would like to create simulated miRNA reads in order to test a pipeline i already tested for 400 miRNA data sets. I found that ART is the most efficient tool to simulate reads and also there is available a new version from skewer team ( https://sourceforge.net/projects/skewer/files/Simulator/ ). In this version (art_illumina_src151-adapter-enabled.tar.gz ) they changed a bit the code in order to create miRNA with adapters but this is all i could find. There is no more documentation. What i could do was to follow the instructions of ART but as you can see above unsuccessful.

ADD REPLY • link 6.1 years ago by dzisis1986 ▴ 70

score 0 · Answer 1 · 2018-10-10

0

Entering edit mode

6.1 years ago

dzisis1986 ▴ 70

So the result is that no one is able to understand how ART works for miRNA simualted data ? No one used this specific version ( https://sourceforge.net/projects/skewer/files/Simulator/ ) of ART to produce reads that contain adapter contaminants?

ADD COMMENT • link 6.1 years ago by dzisis1986 ▴ 70

1

Entering edit mode

Is that a question or a comment? I don't think ART author claims anywhere that they can simulate miRNA sequence data. As we have discussed in past this is a rather special use case.

ADD REPLY • link 6.1 years ago by GenoMax 147k