Question

One SRR (one fastq) file for pair-end layout

0

Entering edit mode

3 months ago

Vojtěch ▴ 10

Hello,

GSE147528 has 20 samples and they are said to be done in pair-end layout. In my experience (im a newbie in RNAseq analysis), pair-end runs have a typical format (*_R1.fastq and *_R2.fastq OR *_1.fastq and *_2.fastq). In all of my pipelines, I am working with these names. However, the experiment mentioned above has only one SRR/fastq file per sample, although being pair-end.

Is there a way to convert one fastq file to typical two separate files, so I don't have to edit my pipelines?
Is this normal for snRNAseq or is it just this experiment?

Thanks in advance.

SRR paired-end SRA • 384 views

ADD COMMENT • link 3 months ago by Vojtěch ▴ 10

score 1 · Answer 1 · 2024-08-05

1

Entering edit mode

3 months ago

GenoMax 147k

This is a 10x dataset and it should be possible to get the three files (R1 is just Illumina indexes so not required), R2 is cellbarcodes +UMI and R3 is the cDNA read.

fastq-dump --split-files SRR11422700

results in

::::::::::::::
SRR11422700_1.fastq
::::::::::::::
@SRR11422700.1
CTCGTCAC
+SRR11422700.1
FFFFFF,,

::::::::::::::
SRR11422700_2.fastq
::::::::::::::
@SRR11422700.1
CNAATCCGTCGCGTGTACGATGATGC
+SRR11422700.1
F#,FFFF:FFFFF,FFFF,FFFFFFF

::::::::::::::
SRR11422700_3.fastq
::::::::::::::
@SRR11422700.1
NAGCAGTGGTATCAACGCAGAGTACATGGGTGTTGGGAAGGGAAGATATTCTATAATTTTATTATTAAATCTCATTATTTTAACAGGGAG
+SRR11422700.1
#FF:FFFFF,FFFFFFFFFFFFFFFF,F:FFFF:FF:F:FFFFFFFFF:FFF::FFFFFFFFFFFFFF:FFFFFFFFFF:FFFFFFF:,F

ADD COMMENT • link 3 months ago by GenoMax 147k

0

Entering edit mode

Thank you for your answer. If I may, I have two following questions: 1) I am running most of my analysis in R studio in R. I have a script, which downloads my files from ENA browser. Is there some other non-linux way to get these files, or is fast-dump the only go-to way? 2) In my normal situation, where there are 2 fastq files, they both contain cDNA sequences in a FASTA format. One forward, one reverse. Right? And so if you are saying that out of these 3 fastq files, only the 3rd contains cDNA, how is this implemented into any pipeline, which works with classical format _1.fastq _2.fastq?

I am sorry if these questions seem too trivial, but I would like to understand it. I appreciate your time, thank you.

ADD REPLY • link 3 months ago by Vojtěch ▴ 10

1

Entering edit mode

fast-dump the only go-to way

That is one way to get the data. If you don't intend to re-analyze the data you may be able to get count files from that GEO record. But this won't be following your standard script.

In my normal situation, where there are 2 fastq files, they both contain cDNA sequences in a FASTA format.

Correct for bulk RNAseq that would be the case. These data on the other hand are single nucleus RNAseq, which is a different technology. So you would not be able to use your standard analysis methods with this data. You will need to use single cell data specific packages like seurat, STARsolo etc.