Question

Fastqc genome reads

0

Entering edit mode

3.5 years ago

Ak ▴ 60

Hi! I'm trying to check the quality of my raw genome reads using fastqc, but I'm encountering this issue. Does anyone what can I do to progress with my analysis? Thanks!

$fastqc EtenNg5_ACAGTG_L004_R1_001.fastq.gz
Started analysis of EtenNg5_ACAGTG_L004_R1_001.fastq.gz
Failed to process file EtenNg5_ACAGTG_L004_R1_001.fastq.gz
uk.ac.babraham.FastQC.Sequence.SequenceFormatException: ID line didn't start with '@'
at uk.ac.babraham.FastQC.Sequence.FastQFile.readNext(FastQFile.java:158)
at uk.ac.babraham.FastQC.Sequence.FastQFile.next(FastQFile.java:125)
at uk.ac.babraham.FastQC.Analysis.AnalysisRunner.run(AnalysisRunner.java:76)
at java.base/java.lang.Thread.run(Thread.java:832)

genome fastqc • 3.1k views

ADD COMMENT • link updated 3.4 years ago by ATpoint 88k • written 3.5 years ago by Ak ▴ 60

2

Entering edit mode

ddi you look at this error: uk.ac.babraham.FastQC.Sequence.SequenceFormatException: ID line didn't start with '@'?

Try two things:

seqkit seq -n <failed.fastq.gz>. See if this works and prints all the headers. This should fail due to error above unless seqkit is header (ID) error tolerant.

gunzip failed.fastq.gz and run sed -nr '1~4p' failed.fq | grep -v "@". This should print headers without @. You can add @ to the headers. you can also do zcat failed.fq.gz| sed -nr '1~4p' | grep "@" to list the headers/IDs without @

ADD REPLY • link 3.5 years ago by cpad0112 21k

0

Entering edit mode

For the 1st one, I got this at the end of the list 1st

And for the 2nd method, I wasn't able to gunzip the file to run sed 2nd

So I've tried zcat that command and got this 3rd

ADD REPLY • link 3.5 years ago by Ak ▴ 60

0

Entering edit mode

It's only the read headers, something is very wrong here. How did you obtain these files?

ADD REPLY • link 3.5 years ago by ATpoint 88k

0

Entering edit mode

I got it from the sequencing company

ADD REPLY • link 3.5 years ago by Ak ▴ 60

3

Entering edit mode

Somewhere along the way these data files appear to have become corrupt. If you are able to then download a new copy.

ADD REPLY • link 3.5 years ago by GenoMax 152k

0

Entering edit mode

Ah, my bad 😅

ADD REPLY • link 3.4 years ago by ATpoint 88k

0

Entering edit mode

Both the commands I asked him/her to run prints only headers and probably because of that, screenshots have only headers. I was checking if the file has any headers without @. However, it seems file is corrupted.

ADD REPLY • link 3.5 years ago by cpad0112 21k

0

Entering edit mode

sorry..there was a type error: zcat failed.fq.gz| sed -nr '1~4p' | grep "@" should be zcat failed.fq.gz| sed -nr '1~4p' | grep -v "@". However, your CRC error seems to stem from corrupt file. Could you ask for md5sum files from your core for the files you have? Generate MD5sums for the files you have and compare it with MD5sums provided by core. If they do not match, request data from core again.

ADD REPLY • link 3.5 years ago by cpad0112 21k

0

Entering edit mode

or FastQC can't take gzipped files as input or your fastq files are not correctly formatted it seems.

ADD REPLY • link 3.5 years ago by lieven.sterck 15k

0

Entering edit mode

Most probably issue with the fastq file, because I did the read 2 for this genome similarly and it was fine. So, seems like there's no other way to resolve this issue?

ADD REPLY • link 3.5 years ago by Ak ▴ 60

0

Entering edit mode

Output of file EtenNg5_ACAGTG_L004_R1_001.fastq.gz?

ADD REPLY • link 3.5 years ago by ATpoint 88k

0

Entering edit mode

Oh I usually just did the command without stating the output file. Only 'fastqc' and its 'input file'

ADD REPLY • link 3.5 years ago by Ak ▴ 60

0

Entering edit mode

Please just type file EtenNg5_ACAGTG_L004_R1_001.fastq.gz into your terminal and paste the output here. This command checks whether the file is compressed or not.

ADD REPLY • link 3.5 years ago by ATpoint 88k

0

Entering edit mode

Oh sorry I misunderstood you. I got this:

EtenNg5_ACAGTG_L004_R1_001.fastq.gz: gzip compressed data, max speed

ADD REPLY • link 3.5 years ago by Ak ▴ 60