Hi,
I am planning to re-analyze the scRNA-seq data on SRA (SRR9713131, https://www.ncbi.nlm.nih.gov/sra/?term=SRR9713131). According to the description of this data it is pair-end data and the first end contain sequence of UMI. However, I found the sequence of this data is single end that in “Reads” sheet of “Run Browser” page. I downloaded it (sra-format file) and converted it into fastq-format using SRAtoolkit and confirmed it indeed is single-end.
My questions are below:
Is it really a single-end data instead of pair-end data? Did I make some mistake?
How and why does this data pass the quality check if it is single-end, since single-end of scRNA-seq is no use?
Hi, ale_abd,
Thank you for your replying.
Actually, I tried to convert it from SRA-format to fastq-format using fastq-dump with --split-files. However, it came out to be single-end. Next, if you click the “Reads” tab on page https://trace.ncbi.nlm.nih.gov/Traces/sra/?run=SRR9713131. You will find the first 10 reads in this data. Then you can compared it with other data in the same dataset like https://trace.ncbi.nlm.nih.gov/Traces/sra/?run=SRR9713123#. You will find the difference.
However, I found that it seem to be permited to download raw fastq file in the "data acess" tab as you mentioned. I am trying to download it. I feel the difference may be there are three files in this data not two as the other data in the same dataset.
Do you agree with me after trying them?
Thanks
Hi,
I agree with you, In the sample that you have mentioned, there are three files. Although you can find other runs from the same project with only two files, sometimes this third file is or unpaired data or it can also be the index (some old related posts: here and here). If you see the third file, all the sequences are 8 bases length and they match with the index, so you can just discard that file...
Cheers.
This is data from 10x so you would need to download all three fastq files from data access tab (Index reads are present in a separate file when 10x cellranger software does demux). Use the original format data links rather than SRA ones.
Do you mean that I need all of three files to run "cellranger count"?
The other data in the same dataset, such as "https://trace.ncbi.nlm.nih.gov/Traces/sra/?run=SRR9713123#", contains only two files and the running of cellranger count is just fine using these two files.
Why I use all the three files for this data?
The third file contain only index, it is already added onto the header of each reads in the other two files. I think it is uneless. Is it right?