Question

what is the mean of the file "*._f1.fq.gz" and "*._r2.fa.gz"

2

Entering edit mode

4.6 years ago

yueli7 ▴ 250

Hello,

I downloaded the files from: https://bigd.big.ac.cn/gsa/

The file is ended with: " *._f1.fq.gz" and " *._r2.fa.gz".

Is it single-end or paired-end sequencing?
If it is paired-end sequencing, the file should be: " *._r1.fq.gz" and " *._r2.fa.gz", not " *._f1.fq.gz".

Thanks in advance for any help!

Best,

Yue

RNA-Seq • 2.8k views

ADD COMMENT • link updated 14 months ago by yhdist ▴ 70 • written 4.6 years ago by yueli7 ▴ 250

0

Entering edit mode

how about this command, which is much shorter

zcat filename | head

ADD REPLY • link 4.3 years ago by wulj2 ▴ 50

0

Entering edit mode

Doesn't work on all operating systems though.

ADD REPLY • link 4.3 years ago by WouterDeCoster 47k

0

Entering edit mode

you are right, zcat is just a bash script, what it depends on is gzip

ADD REPLY • link 4.3 years ago by wulj2 ▴ 50

0

Entering edit mode

Hello, I came across the same problem. I downloaded a single cell RNA-seq dataset from https://bigd.big.ac.cn/gsa/ ,which is ended with "_f1.fq.gz" and "r2.fa.gz". The data came from 10xgenomics platform, however the cellranger cann't identify the "fq.gz" files. Maybe it can only identify the "fastq.gz" files. So, I'd like to ask a question, for next analysis, how to process the fq.gz files\uff1fI would appreciate it if you could help me.

ADD REPLY • link 14 months ago by linzhujay • 0

0

Entering edit mode

14 months ago

yhdist ▴ 70

Those usually stand for forward and reverse strands, respectively, in paired-end sequencing. However, I do recall a few cases from SRA where I'd stumble upon single-end sequencing files that used this convention to point to different replicates.

As suggested previously, unless you're doing this in the context of an automated pipeline, you are better off checking the files afterwards. Usually you can tell just by the headers alone.

ADD COMMENT • link 14 months ago by yhdist ▴ 70

score 4 · Accepted Answer · 2020-05-10

4

Entering edit mode

4.6 years ago

Mensur Dlakic ★ 28k

If it is paired-end sequencing, the file should be: "._r1.fq.gz" and "._r2.fa.gz", not "*._f1.fq.gz".

There are no hard rules regarding the labeling of paired files, which is most likely what yours are. And if there are rules, they are not followed by everyone. In your case, these files are likely forward (r1) and reverse (r2). It is not difficult to verify this after you unpack the files and type:

head *_??.fq

If both files have similar headers except where one of them has 1 the other has 2, they are paired-end files.

ADD COMMENT • link 4.3 years ago by Mensur Dlakic ★ 28k

1

Entering edit mode

gzip -dc filename | head

No need to unpack the whole file.

ADD REPLY • link 4.3 years ago by cschu181 ★ 2.8k

0

Entering edit mode

Hello, Mensur Dlakic,

Thank you so much for your response!

You are correct! These two files are paired-end files. They have the same headers.

Thank you again!

Best,

Yue

ADD REPLY • link 4.6 years ago by yueli7 ▴ 250