Hello, I am trying to download fastq-files (SRR12273024) with fasterq-dump/fastq-dump from sra-tools. I have tried the --split-files and -s tags however, I only get 1 fastq file.
@SRR12273024.1 SN7001050R:482:HYKG3BCXX:1:1101:1163:2092 length=109 GATGTANAGAACGCGACTTCCACAAACCTGGATTTTTTATGTACAACCCTGACCCNGACCGTTTGCTATATTCCTTTTTCTATGAAATAATGTGAATGATAATAAAACA +SRR12273024.1 SN7001050R:482:HYKG3BCXX:1:1101:1163:2092 length=109 DDDDDI#<<EHIIIHIIIIIIIIIHEHIIIIFHHHIIIIIHHIIIIIIHHIIHHH#<<DGHHIHIHIEHEHHHFHHIIIIIIIH?EEHH@HIIHIIIIFEHDDHHHHHH @SRR12273024.2 SN7001050R:482:HYKG3BCXX:1:1101:1096:2166 length=109 AAGGTACCTGGGTTCAACTAAAGCGCCAGCCTGCTCCACCCAGAGAAGCACACTTTGTGAGAACCAATGGGAAGGAGCCTGAGCTGCTGGAACCTATTCCCTATGAATT +SRR12273024.2 SN7001050R:482:HYKG3BCXX:1:1101:1096:2166 length=109 DDDDDIHIIIHIFHIIIIIIIIIIGIHIIIIIIIIIHIHHHIGHIHIHI?GHHG?GFHHDH@FG<<CHGHIGHHIHHHEHH1FHIIIIIIIGHEHHIIHGHDGHHHHGI @SRR12273024.3 SN7001050R:482:HYKG3BCXX:1:1101:1086:2183 length=109 CAGCAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
I have tried various ways to de-interleave the fastq file including methods outlined in: gist.github.com/nathanhaigh/3521724 and biostars.org/p/19446/ however, none of these methods output fastq files that are compatible with the cell ranger pipeline.
When I try to run cellranger counts on the file, I am given this error:
Log message: The read lengths are incompatible with all the chemistries for Sample SRR12273024 in ./
- read1 median length = 109
- read2 median length = 0
- index1 median length = 0
The minimum read length for different chemistries are: SC5P-R2 - read1: 26, read2: 25, index1: 0 SC5P-PE - read1: 81, read2: 25, index1: 0 SC3Pv1 - read1: 25, read2: 10, index1: 14 SC3Pv2 - read1: 26, read2: 25, index1: 0 SC3Pv3 - read1: 26, read2: 25, index1: 0
We expect that at least 50% of the reads exceed the minimum length.
I've looked into this error and it seems like the dataset is paired-end, which is why I have been trying to split the files using sra-tools to no avail.
Any help is appreciated!
Thanks! This seems to have worked out. Additionally, following solution (ii) outlined here seems to make the fastqs compatible with cellranger pipeline. I am curious though, the paper does mention that it is v2 chemistry and the SRA page indicates that there are 3 reads. Is there any way to configure fasterq-dump to get the 3 files: R1,R2 and I1?
If you look at the
Data Access
tab for the SRA record for this run there are four files uploaded. It appears the submitter's may have split the UMI's and Barcodes into separate files. If you need to get R1,R2 and I1 files then you will need to renameSRR12273024_1.fastq
toSRR12273024_I1.fastq
. Then mergeSRR12273024_3.fastq
andSRR12273024_4.fastq
to recreate theSRR12273024_R1.fastq
file.Thanks! How would I go about merging the fastq files? Additionally, does the number of files/the way the fastqs are uploaded to SRA have anything to do with the type of chromium chemistry used (v1,v2,etc)?
In the case of SRR12273037 which is from the same experiment and thus, v2 chemistry, there are only 3 files as opposed to 4. To download this run would I also need to use the following tags: --split-spot --include-technical --split-files
I appreciate your help!
I am not sure why the authors uploaded the data this way. I have not worked with v.1 10x chemistry so don't know if that has some bearing on this. It would be unusual to mix chemistries in one experiment but ..
As it stands, you will need to write some code to match the fastq headers for each record and then merge files 3 and 4 to get the 24 bp read.
https://github.com/ncbi/sra-tools/wiki/HowTo:-fasterq-dump
On this webpage, --split-spot and --split-files options are independently exist.