Question

How to analyze the scRNA seq Fastq files from NCBI

0

Entering edit mode

2.9 years ago

aimanbarki ▴ 20

Hello Everyone,

Is there any tutorial for Following:

how to download Fastq file from NCBI
how to check the file quality (How they needs to be?)>
How to use cell ranger count on Fastq file?
How to understand the output of the count?

I want to work with the healthy data set from the following website:

BioProject_NCBI

I downloaded the fastq file using following command:

fastq-dump --split-files --gzip SRR10134390

I downloaded the reference from Gencode and make ref for cellranger count using following command

mkref --genome=GRCh38.p13 --fasta=GRCh38.primary_assembly.genome.fa --genes=gencode.v39.primary_assembly.annotation.gtf

I ran the cellranger count using the following command:

cellranger count --id=Healthy_aortic_valve2 --fastqs=/healthy1 --transcriptome=GRCh38.p13 --chemistry SC3Pv2

This commands run and created several folders but it does not seem right . because I can not find matrix files, and or BAM files.

Can someone tell me how I can find out the problem?

Thanks

SRAtool Cellranger NCBI • 3.3k views

ADD COMMENT • link updated 18 months ago by Ram 44k • written 2.9 years ago by aimanbarki ▴ 20

1

Entering edit mode

Posting an error message or such is probably a good start. Beyond that, you'd probably get a lot out of the OSCA book in terms of understanding and performing scRNA-seq analysis.

ADD REPLY • link 2.9 years ago by jared.andrews07 ★ 18k

1

Entering edit mode

are you looking in the correct path for the output files? If cellranger count ran successfully it should write the output to Healthy_aortic_valve2/outs according to documentation.

ADD REPLY • link 2.9 years ago by jv ★ 1.8k

0

Entering edit mode

@ Jv The run created the Healthy _aortic_valve2/ . But it does not include the / outs direcotry. Now to find out the issue, from where I should start?

Thanks in advance

ADD REPLY • link 2.9 years ago by aimanbarki ▴ 20

1

Entering edit mode

Input files for cellranger need to be in a specific format with the index sequences in separate files. You can find more information about that types and names of files here: https://support.10xgenomics.com/single-cell-gene-expression/software/pipelines/latest/using/fastq-input

Simply splitting the SRA data may not give you the correct input files. Unfortunately these submitters appear to have not submitted original cellranger BAM file which would have allowed you to recreate the fastq files easily.

ADD REPLY • link 2.9 years ago by GenoMax 148k

1

Entering edit mode

The index sequence doesn't have to be present anymore. It's just a legacy thing that cellranger's mkfastq makes it. (What does matter is that the fastqs be named exactly according to the Illumina standard)

ADD REPLY • link 2.9 years ago by swbarnes2 14k

1

Entering edit mode

Good to know. We generally demux using cellranger so have the files.

ADD REPLY • link 2.9 years ago by GenoMax 148k

0

Entering edit mode

GenoMax and @swbarnes2 I changed the name of the files but i am attaching the pic how fastq file . I think the "+line" does not suppose to look like that or is it fine?

ADD REPLY • link 2.9 years ago by aimanbarki ▴ 20

0

Entering edit mode

That should be fine. If you had used -F (original format option) when dumping the reads out they may look like normal illumina fastq headers (depending on how the submitters sent the data in). cellranger is supposed to only use 26 or 28 bp of read 1 based on chemistry.

Do you have an extra _ in the file names before S1? You should remove that.

ADD REPLY • link 2.9 years ago by GenoMax 148k

1

Entering edit mode

That's how the folders are named when cellranger has yet to finish running properly.

ADD REPLY • link 2.9 years ago by swbarnes2 14k