Hello all. I am trying to pre-process some single cell RNA and ADT (Totalseq-C) data from an GEO SRA, but having some issues getting separate fastq's for the "CITE-seq" (ADT) and the transcriptome (10X).
First I tried sra-tools 3.0.0 using prefetch
first and then fasterq-dump
with the --split-files
flag. This produced 3 fastq files, SRRXXX.fastq , SRRXXX_1, and SRRXXX_2. which I assume is the R1 and R2 of all the lanes combines for the run. which normally is completely fine, but I dont know how to separate out the ADT and re-capitulate the original cellranger counts from that as normally with that pipeline you have seperate files after the bcl2fastq
pipeline... maybe the data is all in there, maybe its not.. but open to ideas on how to split it...
next I found this post from 10X https://support.10xgenomics.com/docs/bamtofastq that implies that SRA converted fastqs can be missing tags produced by cellranger, the original data was processed with cellranger, so i went and downloaded the original uploaded bam.1 file just over http with a wget. I ran bamtofastq
thats built into cellranger, it generated a folder with the correct experiment and sample name, very similar to how I would expect bcl2fastq to do and recapitulated R1 and R2's for all four lanes on the S4 flowcell that was used which I am happy about. L001, L002, etc R1 and R2 for each. BUT I am stuck again in that to validate the findings and repeat the steps in the paper I need to separate out the ADT barcoded reads from the transcriptome.
I feel like I am so close and am missing something simple to separate out these last new fastq's.
thanks for any help you can offer!
have a try this:
namely, I delete "BC:Z: " and "QT:Z: " two parts in which, I guess, ADT information was stored. hope this will help you and not sure 100% success.
Student from Central South University, China