Question

Question about the type of SRA data (SRR9713131, single- or pair-end?)

0

Entering edit mode

4.8 years ago

tujuchuanli ▴ 130

Hi,

I am planning to re-analyze the scRNA-seq data on SRA (SRR9713131, https://www.ncbi.nlm.nih.gov/sra/?term=SRR9713131). According to the description of this data it is pair-end data and the first end contain sequence of UMI. However, I found the sequence of this data is single end that in “Reads” sheet of “Run Browser” page. I downloaded it (sra-format file) and converted it into fastq-format using SRAtoolkit and confirmed it indeed is single-end.

My questions are below:

Is it really a single-end data instead of pair-end data? Did I make some mistake?
How and why does this data pass the quality check if it is single-end, since single-end of scRNA-seq is no use?

scRNA-seq • 2.7k views

ADD COMMENT • link 4.8 years ago by tujuchuanli ▴ 130

score 1 · Answer 1 · 2020-01-27

1

Entering edit mode

4.8 years ago

ale_abd ▴ 50

Hello,

SRA stores paired end data on one single .sra file, thus, when you download them via SRA files you have to specify that you want the reads to be split:

fastq-dump --split-files SRR9713131

If you see this link ("Data access" tab), you can also see that the uploaded raw data is paired end and not single.

Hope this helps!

ADD COMMENT • link 4.8 years ago by ale_abd ▴ 50

0

Entering edit mode

Hi, ale_abd,

Thank you for your replying.

Actually, I tried to convert it from SRA-format to fastq-format using fastq-dump with --split-files. However, it came out to be single-end. Next, if you click the “Reads” tab on page https://trace.ncbi.nlm.nih.gov/Traces/sra/?run=SRR9713131. You will find the first 10 reads in this data. Then you can compared it with other data in the same dataset like https://trace.ncbi.nlm.nih.gov/Traces/sra/?run=SRR9713123#. You will find the difference.

However, I found that it seem to be permited to download raw fastq file in the "data acess" tab as you mentioned. I am trying to download it. I feel the difference may be there are three files in this data not two as the other data in the same dataset.

Do you agree with me after trying them?

Thanks

ADD REPLY • link 4.8 years ago by tujuchuanli ▴ 130

0

Entering edit mode

Hi,

I agree with you, In the sample that you have mentioned, there are three files. Although you can find other runs from the same project with only two files, sometimes this third file is or unpaired data or it can also be the index (some old related posts: here and here). If you see the third file, all the sequences are 8 bases length and they match with the index, so you can just discard that file...

Cheers.

ADD REPLY • link 4.8 years ago by ale_abd ▴ 50

0

Entering edit mode

This is data from 10x so you would need to download all three fastq files from data access tab (Index reads are present in a separate file when 10x cellranger software does demux). Use the original format data links rather than SRA ones.

ADD REPLY • link 4.8 years ago by GenoMax 147k

0

Entering edit mode

Do you mean that I need all of three files to run "cellranger count"?

The other data in the same dataset, such as "https://trace.ncbi.nlm.nih.gov/Traces/sra/?run=SRR9713123#", contains only two files and the running of cellranger count is just fine using these two files.

Why I use all the three files for this data?

The third file contain only index, it is already added onto the header of each reads in the other two files. I think it is uneless. Is it right?

ADD REPLY • link 4.8 years ago by tujuchuanli ▴ 130