Question

quality value with art fastq simulator

0

Entering edit mode

6.7 years ago

marongiu.luigi ▴ 730

dear all,

i am trying to generate simulated fastq files from a fasta reference using ART. Following the manual, I entered the following:

art_illumina -ss HS25 -i ./input.fa -p -l 50 -f 20 -m 200 -s 10 -o ./output

In this case, the simulated instruments is Illumina HiSeq2500, pair mates created, length 50 pb with mean of 200 (not sure what the difference is here) and a coverage of 20. I then checked the quality of the output with FastQC and I get reads of 50 bp in length but the quality is all skewed at the maximum of 38 quality: enter image description here I therefore provided the values for maximum and minimum quality score:

art_illumina -ss HS25 -i ./input -p -l 36 -f 30 -m 50 -s 10 -qU 30 -qL 25 -o ./output

but in this case the quality score was not simply skewed: rather it was uniform with a single value of 30: enter image description here

How can I obtain something more like the following plot? enter image description here Thank you.

next-gen sequencing RNA-Seq • 2.7k views

ADD COMMENT • link updated 6.6 years ago by h.mon 35k • written 6.7 years ago by marongiu.luigi ▴ 730

1

Entering edit mode

Since the read length for HiSeq2500 is 36

No it is not. You can run sequencing lengths as long or as short as you want. Maximum length for HiSeq 2500 rapid run can be 2 x 250 bp. In order to get specific enough mapping you probably don't want to go much below 36 bp (for a human sized genome).

ADD REPLY • link 6.7 years ago by GenoMax 148k

0

Entering edit mode

OK, I took the lower end of the scale. But how would I set a good range of quality score? and what is the relation between -l and -m? Tx

ADD REPLY • link 6.7 years ago by marongiu.luigi ▴ 730

0

Entering edit mode

You may want to look at the in-line help/manual for the specific options of ART.

Realistically if your libraries are good then you are rarely going to see Q scores below 30 across the board. So things between 30-40 would be fine. If you are artificially trying to achieve a different range then you can choose those numbers.

ADD REPLY • link 6.7 years ago by GenoMax 148k

0

Entering edit mode

I am trying to aritficially create some libraries that look like THIS. I therefore provided, based on the readme file included with ART, the options -qL --minQ the minimum base quality score and -qU --maxQ the maxiumum base quality score but the values were not randomly sampled between these boundaries but fixed at 30.

ADD REPLY • link 6.7 years ago by marongiu.luigi ▴ 730

0

Entering edit mode

I think I got the difference between -l and -m: the former is the length of the read in the fastq file, the latter the length of the fragment of DNA/RNA that is being sequenced, therefore m needs to be longer than l.

ADD REPLY • link 6.7 years ago by marongiu.luigi ▴ 730

0

Entering edit mode

For better focus, I removed the part fo the post dealing with the reading lenght

ADD REPLY • link 6.6 years ago by marongiu.luigi ▴ 730

score 0 · Answer 1 · 2018-05-08

0

Entering edit mode

6.6 years ago

h.mon 35k

Grab a set of reads with the intended quality profile, create a profile with art_illumina_profiler, the simulate reads with the same profile using art_illumina and the parameters:

    -1   --qprof1   the first-read quality profile
    -2   --qprof2   the second-read quality profile

ADD COMMENT • link 6.6 years ago by h.mon 35k