I'm currently trying to get fastq or fastq.gz for cellranger count input
I found those 7 sra sites
And for example in https://www.ncbi.nlm.nih.gov/sra/SRX12574893[accn],
Original fastq file looks like it contains multiple lands and pair-end (16 files)
patient_A-1_S1_L003_R1_001.fastq.gz.1
patient_A-1_S1_L003_R2_001.fastq.gz.1
patient_A-2_S1_L004_R1_001.fastq.gz.1
patient_A-2_S1_L004_R2_001.fastq.gz.1
...
But after downloading the sra file, I separated it into a paired end file through fastq-dump --split-files, and found only two files.
such
- SRX12574893_1.fastq
- SRX12574893_2.fastq
Is there any method for split right way??
2. Sequencing those multiple files using cellranger
2-1) If using SRX12574893_1.fastq SRX12574893_2.fastq, ....
SRX12574893_1.fastq -> SAMPLE_L001_R1_001.fastq
SRX12574893_2.fastq -> SAMPLE_L001_R2_001.fastq
SRX12574894_1.fastq -> SAMPLE_L002_R1_001.fastq
SRX12574894_2.fastq -> SAMPLE_L002_R2_001.fastq
SRX12574895_1.fastq -> SAMPLE_L003_R1_001.fastq
SRX12574895_2.fastq -> SAMPLE_L003_R2_001.fastq
Is it right way? ...
2-2) If there's way for split tothose files
patient_A-1_S1_L003_R1_001.fastq.gz
patient_A-1_S1_L003_R2_001.fastq.gz
patient_A-2_S1_L004_R1_001.fastq.gz
patient_A-2_S1_L004_R2_001.fastq.gz ....
patient_B-1_S1_L003_R1_001.fastq.gz
patient_B-1_S1_L003_R2_001.fastq.gz
...
patient_C-2_S1_L004_R1_001.fastq.gz
patient_C-2_S1_L004_R2_001.fastq.gz
...
How should i change those file names??
Unfortunately NCBI's 10x data submission is somewhat of a wild-west and allows sumitters to do what they want. This seems to be a 10x scRNAseq data which means read 1 should only be 26 or 28 bp in length. Generally under
Data Access
tab some submitters provide original cellranger BAM files that can be used to reconstitute the original data. That does not seem to be the case here.If all the different lane data has been concatenated, that's not a problem at all.