Question

Mapping FASTQ files of scRNA-seq to reference genome

0

Entering edit mode

17 months ago

Researcher ▴ 20

I am trying to analyze a publicly available data on SRA. For that I have to map FASTQ files to the reference genome. When I did this before, I downloaded the FASTQ files from Google Cloud and could map them to the reference genome using CellRanger. However, here my only option is to downloaded them from EBI or using SRA toolkit but then if I try CellRanger I will get a naming convention error. What other tools are available besides CellRanger to map the reads of single-cell RNA-seq to the reference genome using FASTQ files downloaded from EBI?

star EBI cellranger scRNA-seq SRA • 2.1k views

ADD COMMENT • link updated 17 months ago by Ram 45k • written 17 months ago by Researcher ▴ 20

1

Entering edit mode

alevin-fry and kallisto bustools are popular alternatives.

ADD REPLY • link 17 months ago by rpolicastro 13k

0

Entering edit mode

I second salmon and kallisto. They work directly with fastq files.

ADD REPLY • link 17 months ago by Ming Tommy Tang ★ 4.6k

0

Entering edit mode

but then if I try CellRanger I will get a naming convention error

Rename the files. Or better, use softlinks. What's the problem with that?

ADD REPLY • link 17 months ago by Ram 45k

0

Entering edit mode

I have tried doing this before, it did not work. I still got a header mismatch error.

ADD REPLY • link 17 months ago by Researcher ▴ 20

0

Entering edit mode

Can you show me an example entry where you ran into this error?

ADD REPLY • link 17 months ago by Ram 45k

1

Entering edit mode

Yes, here is the error:

Log message:

FASTQ header mismatch detected at line 4 of input files "fastq/sample-Barcode/sample-Barcode_S4_L001_R1_001.fastq.gz" and "fastq/sample-Barcode/sample-Barcode_S4_L001_R2_001.fastq.gz": file: "fastq/sample-Barcode/sample-Barcode_S4_L001_R1_001.fastq.gz", line: 4

ADD REPLY • link updated 17 months ago by Ram 45k • written 17 months ago by Researcher ▴ 20

0

Entering edit mode

Please share the GEO/EBI ID of these FASTQ files.

ADD REPLY • link 17 months ago by Ram 45k

1

Entering edit mode

I am analyzing the scRNA-seq data of PRJNA657088, the SRA accession codes are: SRR12654354, SRR12654355, SRR12654356, SRR12654367, SRR12654378, SRR12654379. I tried to get them from AWS but I have to create a bucket and give permission which I did but I face an error on the "create a data delivery order" on the NCBI website about not giving permissions to the bucket.

ADD REPLY • link 17 months ago by Researcher ▴ 20

1

Entering edit mode

After dumping a couple of reads for one of these I don't see any problems with mismatches.

$ head -4 SRR12654354*
==> SRR12654354_1.fastq <==
@K00162:326:HWJNNBBXX:8:1101:1103:1191
NTTACATG
+K00162:326:HWJNNBBXX:8:1101:1103:1191
#AAFFJJJ

==> SRR12654354_2.fastq <==
@K00162:326:HWJNNBBXX:8:1101:1103:1191
NATGAAAAGAGTTGGCGGTTGCACTT
+K00162:326:HWJNNBBXX:8:1101:1103:1191
#AAFFJJJJJJJJJJJJJJJJJJJJJ

==> SRR12654354_3.fastq <==
@K00162:326:HWJNNBBXX:8:1101:1103:1191
NGTGGGGAGCAGAGAATTCTCTTATCCAACATCAACATCTTGGTCAGATTTGAACTCATCAATCTCTTGCACTCAAAGCTTGTTAAGATAGTTAAGCG
+K00162:326:HWJNNBBXX:8:1101:1103:1191
#<<<F<7FFJFJAJJJFJ7F-AAJF7-AJJJFAJJ7JA-7AJJ77F-J7-A<FJF-7-7FJFFJJJJF-FFFJJJJFFJ7FA-<AFFJJJF<JJJJJF

ADD REPLY • link 17 months ago by GenoMax 151k

0

Entering edit mode

Please also share the exact CellRanger command you're using.

ADD REPLY • link 17 months ago by Ram 45k

0

Entering edit mode

I still got a header mismatch error.

This and naming convention are two separate errors.

If you are getting a header mismatch then your reads are likely out of sync in R1/R2 files.

ADD REPLY • link 17 months ago by GenoMax 151k

0

Entering edit mode

I am getting this error message: Log message: FASTQ header mismatch detected at line 4 of input files "fastq/sample-Barcode/sample-Barcode_S4_L001_R1_001.fastq.gz" and "fastq/sample-Barcode/sample-Barcode_S4_L001_R2_001.fastq.gz": file: "fastq/sample-Barcode/sample-Barcode_S4_L001_R1_001.fastq.gz", line: 4 If it is because the reads are out of sync in R1/R2, how can I fix this?

ADD REPLY • link 17 months ago by Researcher ▴ 20

0

Entering edit mode

You can use repair.sh from BBMap suite to bring the reads back in sync. Here is an example command line: How to resync paired-end data?

ADD REPLY • link 17 months ago by GenoMax 151k