Fastq error compiled with ART
0
0
Entering edit mode
5.1 years ago

hello

I have created a pair of fastq files with ART. I created a pair of fastq files for each human chromosome and then concatenated them. I have inserted some mutations in the chromosomes and generated two files for each chromosome in order to simulate two alleles:

cat ch_01.allA.fa ch_01.allB.fa > ch_01.fa
art_illumina -1 -p -f 100 -l 140 -m 300 -s 10  -i ch_01.fa -o ch-01_
gzip ch-01_
[repeat for all chromosomes]
zcat ch-01_1.fq.gz ... > file_1.fq.gz

I used FastQValidator to check the consistency of the files but I get:

$ fastQValidator --file file_1.fq.gz
ERROR on Line 329301201: Repeated Sequence Identifier: 1-164667300/2 at Lines 1 and 329301201
ERROR on Line 329301205: Repeated Sequence Identifier: 1-164667298/2 at Lines 5 and 329301205
ERROR on Line 329301209: Repeated Sequence Identifier: 1-164667296/2 at Lines 9 and 329301209
ERROR on Line 329301213: Repeated Sequence Identifier: 1-164667294/2 at Lines 13 and 329301213
ERROR on Line 329301217: Repeated Sequence Identifier: 1-164667292/2 at Lines 17 and 329301217
...

this also for file_2.fq.gz.

What would be the cause? Can I fix these files?

fastq fastqvalidator art • 1.1k views
ADD COMMENT
0
Entering edit mode

Perhpas this is some internal limitation of ART. I see that you have simulated 300 million read? You could try mutate.sh from BBMap suite if ART has a limitation as an alternative.

ADD REPLY
0
Entering edit mode

Why 300 M reads? I set for:

-f read coverage = 100
-l length of reads = 140
-m mean size of DNA/RNA fragments for paired-end simulations = 300
-s standard deviation of the fragment length = 10
ADD REPLY
0
Entering edit mode

which version of ART are you using?

ADD REPLY
0
Entering edit mode

you are right, there are 666 136 502 reads

ADD REPLY
0
Entering edit mode

I created a pair of fastq files for each human chromosome and then concatenated them.

Why not generate the reads for the entire genome at one time. Since you generated them piecemeal the read header seems to have been duplicated.

ADD REPLY

Login before adding your answer.

Traffic: 1938 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6