Question

Paired layout, but one fastq file

0

Entering edit mode

21 months ago

Andy ▴ 120

Good morning everyone,

I found data from GEO, GSE115469, and the author stated that it is in a paired fastq layout. However, I have only found one fastq file. I am wondering how to handle this fastq file because I need to re-run the cell ranger process.

Thanks Andy

fastq • 2.1k views

ADD COMMENT • link updated 9 months ago by GenoMax 148k • written 21 months ago by Andy ▴ 120

0

Entering edit mode

If you're dealing with bulk data (technically possible even in single cell data), Paired End reads can also be interleaved and stored in a single FASTQ file. Always examine file content - that's part of the sanity check.

ADD REPLY • link 21 months ago by Ram 44k

0

Entering edit mode

That would be unusual, since this is single-cell data.

ADD REPLY • link 21 months ago by GenoMax 148k

1

Entering edit mode

OP's post does not mention single-cell, which is why I suggested this possibility. For future users that might run into this problems, I think it's important to understand that number of files may not be the most reliable indicator of SE/PE nature of sequencing.

ADD REPLY • link 21 months ago by Ram 44k

1

Entering edit mode

GEO accession listed in original post is for single cell dataset.

ADD REPLY • link 21 months ago by GenoMax 148k

0

Entering edit mode

I initially didn't bother looking up the GEO entry. I assumed it was bulk so the option did not cross my mind. I'll move my answer to a comment. I wish OP would have mentioned it - if I assumed bulk, so might other people that actually run into this problem in the bulk context.

ADD REPLY • link 21 months ago by Ram 44k

score 0 · Answer 1 · 2023-03-06

0

Entering edit mode

21 months ago

Andy ▴ 120

I understand why now, the author only shared bam file.

ADD COMMENT • link 21 months ago by Andy ▴ 120

1

Entering edit mode

I've moved your comment to an answer and added an answer of my own. Please accept your answer and optionally mine too to mark your post as resolved.

ADD REPLY • link 21 months ago by Ram 44k

1

Entering edit mode

You can recreate the fastq files using a tool provided by 10x genomics called bamtofastq (LINK). This will properly recreated the CB+UMI - R1 file and R2 RNA read file.

ADD REPLY • link 21 months ago by GenoMax 148k

0

Entering edit mode

Yes, bamtofastq do solve the problem. And this gives me correct fastq files.

ADD REPLY • link 21 months ago by Andy ▴ 120

0

Entering edit mode

I am also having a similar issue where there is only one fastq file but no bam file available so from my understanding I can't use bamtofastq. Is there another solution? I have attached the link to the dataset I am trying to download: https://trace.ncbi.nlm.nih.gov/Traces/?view=run_browser&acc=SRR14667226&display=metadata.

I have tried the following: fasterq-dump SRR14667226 --include-technical -S
fasterq-dump SRR14667226
fastq-dump SRR14667226 --split-3 --skip-technical

ADD REPLY • link 9 months ago by Sky ▴ 10

0

Entering edit mode

There is a BAM file available here: https://sra-pub-src-2.s3.amazonaws.com/SRR14667226/CTRL_possorted_genome_bam.bam.1

Please get that with curl/wget and convert using the 10x utility.

You need to look for the BAM files under Data Access tab (for above file: https://trace.ncbi.nlm.nih.gov/Traces/?view=run_browser&acc=SRR14667226&display=metadata ). They may not always be available. 10x data submission at SRA can be a hit and miss thing.

ADD REPLY • link 9 months ago by GenoMax 148k