Entering edit mode
5.3 years ago
marongiu.luigi
▴
730
Hello, I have created some mutated human sequences modifying the GCRh38 fasta files. I then concatenated the files adn generated the fastq files with
$ art -1 .../art/Illumina_profiles/custom/HiSeq2k_0m1.txt -2 ...art/Illumina_profiles/custom/HiSeq2k_0m2.txt -p -f 100 -l 140 -m 300 -s 10 -i humanMut.fa -o Mut
This command worked for smaller genomes with coverage of 30-50; HiSeq2k_0m1|2.txt are the quality profiles. When I check the quality though:
$ fastqc Mut_1.fq
...
Approx 95% complete for sismi2N_1.fq
Failed to process file sismi2N_1.fq
uk.ac.babraham.FastQC.Sequence.SequenceFormatException: Ran out of data in the middle of a fastq entry. Your file is probably truncated
at uk.ac.babraham.FastQC.Sequence.FastQFile.readNext(FastQFile.java:179)
at uk.ac.babraham.FastQC.Sequence.FastQFile.next(FastQFile.java:125)
at uk.ac.babraham.FastQC.Analysis.AnalysisRunner.run(AnalysisRunner.java:76)
at java.lang.Thread.run(Thread.java:745)
What can be the problem here? Why the protocol was working before? Is the file too large? Is the coverage too large? Thanks
The error is quite clear, is it?
Check the fq file for truncated sequences.
that is exactly what i don't understand: how can the fastq be truncated? i followed the same strategy, that is converting a fasta into fastq, i did not touch the fastq, how did they get truncated? and how can i check what sequences are truncated?
I cannot tell you why this is, can be memory shortage, premature kill of the job, bug in code...
I would start with validating the fastq files, e.g. https://genome.sph.umich.edu/wiki/FastQValidator or a simple
awk
command that checks if SEQ and QUAL are the same for all entries. Asfastqc
complained at > 95% complete, maybetail your.fastq
would be a good start, as it could be the last entry that is odd.repair.sh
from BBMap suite might be worth looking at as well. Also check if runningfastqc
with maximum verbosity helps narrowing down the problem.Thanks, I'll try that...
Hi, I tried FastqValidator but it only told me the obvious:
Is there a way to pick the entry that gave the error and what is the error? In running the test I got, as exected:
So it is worrisome that I did not get anything from my file...