Question

How do I get separate ADT / CITE-seq fastq's from single SRA / BAM files? (originally generated from cellranger)

1

Entering edit mode

2.2 years ago

msn ▴ 130

Hello all. I am trying to pre-process some single cell RNA and ADT (Totalseq-C) data from an GEO SRA, but having some issues getting separate fastq's for the "CITE-seq" (ADT) and the transcriptome (10X).

First I tried sra-tools 3.0.0 using prefetch first and then fasterq-dump with the --split-files flag. This produced 3 fastq files, SRRXXX.fastq , SRRXXX_1, and SRRXXX_2. which I assume is the R1 and R2 of all the lanes combines for the run. which normally is completely fine, but I dont know how to separate out the ADT and re-capitulate the original cellranger counts from that as normally with that pipeline you have seperate files after the bcl2fastq pipeline... maybe the data is all in there, maybe its not.. but open to ideas on how to split it...

next I found this post from 10X https://support.10xgenomics.com/docs/bamtofastq that implies that SRA converted fastqs can be missing tags produced by cellranger, the original data was processed with cellranger, so i went and downloaded the original uploaded bam.1 file just over http with a wget. I ran bamtofastq thats built into cellranger, it generated a folder with the correct experiment and sample name, very similar to how I would expect bcl2fastq to do and recapitulated R1 and R2's for all four lanes on the S4 flowcell that was used which I am happy about. L001, L002, etc R1 and R2 for each. BUT I am stuck again in that to validate the findings and repeat the steps in the paper I need to separate out the ADT barcoded reads from the transcriptome.

I feel like I am so close and am missing something simple to separate out these last new fastq's.

thanks for any help you can offer!

cellranger scRNA SRA CITE-Seq sra-tools • 1.5k views

ADD COMMENT • link updated 2.0 years ago by GenoMax 147k • written 2.2 years ago by msn ▴ 130

0

Entering edit mode

have a try this:

/YOUR_CellRanger_PATH/external/anaconda/bin/samtools view input.bam -h | sed -e "s/BC:Z:.\{9\}QT:Z:.\{9\}//g" | /YOUR_CellRanger_PATH/external/anaconda/bin/samtools view -b > output.bam

namely, I delete "BC:Z: " and "QT:Z: " two parts in which, I guess, ADT information was stored. hope this will help you and not sure 100% success.

Student from Central South University, China

ADD REPLY • link 2.2 years ago by duanmingwu • 0

score 2 · Accepted Answer · 2022-11-15

2

Entering edit mode

2.0 years ago

msn ▴ 130

Updated Solution: the files uploaded to GEO were corrupted

ADD COMMENT • link 2.0 years ago by msn ▴ 130

0

Entering edit mode

Is that your conclusion or did SRA support confirm this?

ADD REPLY • link 2.0 years ago by GenoMax 147k

2

Entering edit mode

Yes this is confirmed. Wasn't even on my radar. I assumed someone on the chain of publication would have validated the upload. I sent my error traceback to support, they attempted to independently try the same thing and got the same error and crash, and reported the file as incomplete.