Fastqc genome reads
0
0
Entering edit mode
2.9 years ago
Ak ▴ 60

Hi! I'm trying to check the quality of my raw genome reads using fastqc, but I'm encountering this issue. Does anyone what can I do to progress with my analysis? Thanks!

$fastqc EtenNg5_ACAGTG_L004_R1_001.fastq.gz
Started analysis of EtenNg5_ACAGTG_L004_R1_001.fastq.gz
Failed to process file EtenNg5_ACAGTG_L004_R1_001.fastq.gz
uk.ac.babraham.FastQC.Sequence.SequenceFormatException: ID line didn't start with '@'
at uk.ac.babraham.FastQC.Sequence.FastQFile.readNext(FastQFile.java:158)
at uk.ac.babraham.FastQC.Sequence.FastQFile.next(FastQFile.java:125)
at uk.ac.babraham.FastQC.Analysis.AnalysisRunner.run(AnalysisRunner.java:76)
at java.base/java.lang.Thread.run(Thread.java:832)
genome fastqc • 2.4k views
ADD COMMENT
2
Entering edit mode

ddi you look at this error: uk.ac.babraham.FastQC.Sequence.SequenceFormatException: ID line didn't start with '@'?

Try two things:

  1. seqkit seq -n <failed.fastq.gz>. See if this works and prints all the headers. This should fail due to error above unless seqkit is header (ID) error tolerant.
  1. gunzip failed.fastq.gz and run sed -nr '1~4p' failed.fq | grep -v "@". This should print headers without @. You can add @ to the headers. you can also do zcat failed.fq.gz| sed -nr '1~4p' | grep "@" to list the headers/IDs without @
ADD REPLY
0
Entering edit mode

For the 1st one, I got this at the end of the list 1st

And for the 2nd method, I wasn't able to gunzip the file to run sed 2nd

So I've tried zcat that command and got this 3rd

ADD REPLY
0
Entering edit mode

It's only the read headers, something is very wrong here. How did you obtain these files?

ADD REPLY
0
Entering edit mode

I got it from the sequencing company

ADD REPLY
3
Entering edit mode

Somewhere along the way these data files appear to have become corrupt. If you are able to then download a new copy.

ADD REPLY
0
Entering edit mode

Ah, my bad 😅

ADD REPLY
0
Entering edit mode

Both the commands I asked him/her to run prints only headers and probably because of that, screenshots have only headers. I was checking if the file has any headers without @. However, it seems file is corrupted.

ADD REPLY
0
Entering edit mode

sorry..there was a type error: zcat failed.fq.gz| sed -nr '1~4p' | grep "@" should be zcat failed.fq.gz| sed -nr '1~4p' | grep -v "@". However, your CRC error seems to stem from corrupt file. Could you ask for md5sum files from your core for the files you have? Generate MD5sums for the files you have and compare it with MD5sums provided by core. If they do not match, request data from core again.

ADD REPLY
0
Entering edit mode

or FastQC can't take gzipped files as input or your fastq files are not correctly formatted it seems.

ADD REPLY
0
Entering edit mode

Most probably issue with the fastq file, because I did the read 2 for this genome similarly and it was fine. So, seems like there's no other way to resolve this issue?

ADD REPLY
0
Entering edit mode

Output of file EtenNg5_ACAGTG_L004_R1_001.fastq.gz?

ADD REPLY
0
Entering edit mode

Oh I usually just did the command without stating the output file. Only 'fastqc' and its 'input file'

ADD REPLY
0
Entering edit mode

Please just type file EtenNg5_ACAGTG_L004_R1_001.fastq.gz into your terminal and paste the output here. This command checks whether the file is compressed or not.

ADD REPLY
0
Entering edit mode

Oh sorry I misunderstood you. I got this:

EtenNg5_ACAGTG_L004_R1_001.fastq.gz: gzip compressed data, max speed
ADD REPLY

Login before adding your answer.

Traffic: 1856 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6