what is the mean of the file "*._f1.fq.gz" and "*._r2.fa.gz"
2
2
Entering edit mode
4.6 years ago
yueli7 ▴ 250

Hello,

I downloaded the files from: https://bigd.big.ac.cn/gsa/

The file is ended with: " *._f1.fq.gz" and " *._r2.fa.gz".

  1. Is it single-end or paired-end sequencing?

  2. If it is paired-end sequencing, the file should be: " *._r1.fq.gz" and " *._r2.fa.gz", not " *._f1.fq.gz".

Thanks in advance for any help!

Best,

Yue

RNA-Seq • 2.8k views
ADD COMMENT
0
Entering edit mode

how about this command, which is much shorter

zcat filename | head
ADD REPLY
0
Entering edit mode

Doesn't work on all operating systems though.

ADD REPLY
0
Entering edit mode

you are right, zcat is just a bash script, what it depends on is gzip

ADD REPLY
0
Entering edit mode

Hello, I came across the same problem. I downloaded a single cell RNA-seq dataset from https://bigd.big.ac.cn/gsa/ ,which is ended with "_f1.fq.gz" and "r2.fa.gz". The data came from 10xgenomics platform, however the cellranger cann't identify the "fq.gz" files. Maybe it can only identify the "fastq.gz" files. So, I'd like to ask a question, for next analysis, how to process the fq.gz files\uff1fI would appreciate it if you could help me.

ADD REPLY
4
Entering edit mode
4.6 years ago
Mensur Dlakic ★ 28k

If it is paired-end sequencing, the file should be: "._r1.fq.gz" and "._r2.fa.gz", not "*._f1.fq.gz".

There are no hard rules regarding the labeling of paired files, which is most likely what yours are. And if there are rules, they are not followed by everyone. In your case, these files are likely forward (r1) and reverse (r2). It is not difficult to verify this after you unpack the files and type:

head *_??.fq

If both files have similar headers except where one of them has 1 the other has 2, they are paired-end files.

ADD COMMENT
1
Entering edit mode
gzip -dc filename | head

No need to unpack the whole file.

ADD REPLY
0
Entering edit mode

Hello, Mensur Dlakic,

Thank you so much for your response!

You are correct! These two files are paired-end files. They have the same headers.

Thank you again!

Best,

Yue

ADD REPLY
0
Entering edit mode
14 months ago
yhdist ▴ 70

Those usually stand for forward and reverse strands, respectively, in paired-end sequencing. However, I do recall a few cases from SRA where I'd stumble upon single-end sequencing files that used this convention to point to different replicates.

As suggested previously, unless you're doing this in the context of an automated pipeline, you are better off checking the files afterwards. Usually you can tell just by the headers alone.

ADD COMMENT

Login before adding your answer.

Traffic: 2078 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6