Question

scRNA-seq: Kallisto processing of bioproject data (fastq-dump)

0

Entering edit mode

5.6 years ago

bsmith030465 ▴ 240

Hi,

I was trying to get started with scRNA seq analysis. I downloaded a test dataset from bioproject (NCBI). However, each sample is in three fastq files (e.g. SRR123_1.fastq.gz, SRR123_2.fastq.gz, SRR123_3.fastq.gz).

How do I process these in Kallisto? Do I need to combine all of these files (how?), or split each file into forward and reverse reads before combining?

Else, what fastq-dump command do I need to issue to download the forward and reverse reads as separate files? My current command is:

fastq-dump -I --split-files SRR123

thanks!

scRNA-seq RNA-Seq kallisto bioproject fastq-dump • 2.1k views

ADD COMMENT • link 5.6 years ago by bsmith030465 ▴ 240

0

Entering edit mode

could you post the command of :

gzip -dc SRR123_1.fastq.gz | head -n 4
gzip -dc SRR123_2.fastq.gz | head -n 4
gzip -dc SRR123_3.fastq.gz | head -n 4

ADD REPLY • link 5.6 years ago by Nicolas Rosewick 11k

0

Entering edit mode

gzip -dc SRR123_1.fastq.gz | head -n 4
@NB501328:163:HK2GVBGX5:2:11101:14937:1054
ACGAGCCANTGTACCTGTGATGGAAC
+NB501328:163:HK2GVBGX5:2:11101:14937:1054
AAAAAEEE#EEEEEEEEEEEEEEEEE

gzip -dc SRR123_2.fastq.gz | head -n 4
@NB501328:163:HK2GVBGX5:2:11101:14937:1054
TATCTAAAATNAANGTNGTNAAAAGTTATNTNNCTGTGTTNTTACNNTNNTTAANANTGTNNNATTNNNNTCCNNCANTNNTNANNNNTNNNNNNNAT
+NB501328:163:HK2GVBGX5:2:11101:14937:1054
AAAAAAAEEE#6E#EE#EE#EEEEEEEEE#E##EEEEEEE#EEEA##E##EEAE#6#EEE###/EE####EE<##/A#/##/#<####/#######/<

gzip -dc SRR123_3.fastq.gz | head -n 4
@NB501328:163:HK2GVBGX5:2:11101:14937:1054
GAAACCCT
+NB501328:163:HK2GVBGX5:2:11101:14937:1054
AAAAAEEE

ADD REPLY • link 5.6 years ago by bsmith030465 ▴ 240

0

Entering edit mode

ok so I guess that the SRR123_3.fastq.gz is the sequencing barcode ( to multiplex multiple samples). Could you maybe post the link to the bioproject please ?

ADD REPLY • link 5.6 years ago by Nicolas Rosewick 11k

0

Entering edit mode

I got the fastq files by executing: fastq-dump --split-files --gzip SRR8611970

ADD REPLY • link 5.6 years ago by bsmith030465 ▴ 240

2

Entering edit mode

Looking at SRA webpage : https://trace.ncbi.nlm.nih.gov/Traces/sra/?run=SRR8611970 it seems that _1 = R1 ; _2 = R2 and _3 = sample index.

check here for more details : https://bioinformatics.stackexchange.com/questions/5178/what-is-the-index-fastq-file-sample-i-fastq-gz-generated-when-demultiplexing

ADD REPLY • link 5.6 years ago by Nicolas Rosewick 11k