I am using SRA Toolkit for SRA Reads from NCBI web page. I went through the mannual and came up of using this command: ~/bin/sratoolkit.2.7.0-centos_linux64/bin/fastq-dump --split-files --fasta 60 SRR1981235
I thought I will download the SRA files and will split into two files. It created two fasta files as fasta.1 and fasta.2 (paired reads). But for aligment it says the number of reads are not equal. I am very new for this and I need advise.
Regards!
I had this issue as well.It happens all the time with SRA toolkit while downloading the sra files and converting them to fastq.
Sometimes you need to redownload the SRA files all over again.This should work.
if theses FASTQ files were generated with fastq-dump, the number of lines has to be a multiple of 4. An uneven number of lines indicates a problem.
SRR1981235 reads were obtained from Helicobacter pylori, which has a small chromosome of only about 1.600.000 nucleotides. SRR1981235 has an absurd coverage of more than 1000 fold.
There are more than 500 read sets available for Helicobacter pylori in the short read archive. Just take another one.
oh thats a handy link...thank you! However: is the SRA toolit pointing out problems in the SRR1981108 data set that should not be ignored just by bypassing the SRA toolkit? If so, would these be spotted anyway with FASTQC?
kwc17@Bioinftop2016-05 ~/Desktop/new $ fastq-dump --split-files --fasta 60 SRR1981108
2016-10-04T20:27:39 fastq-dump.2.6.3 warn: too many reads 33 at spot id 9935, maximum 32 supported, skipped
2016-10-04T20:27:39 fastq-dump.2.6.3 warn: too many reads 43 at spot id 19767, maximum 32 supported, skipped
2016-10-04T20:27:52 fastq-dump.2.6.3 warn: too many reads 77 at spot id 37494, maximum 32 supported, skipped
2016-10-04T20:27:56 fastq-dump.2.6.3 warn: too many reads 65 at spot id 43137, maximum 32 supported, skipped
2016-10-04T20:28:23 fastq-dump.2.6.3 warn: too many reads 69 at spot id 88649, maximum 32 supported, skipped
2016-10-04T20:28:24 fastq-dump.2.6.3 warn: too many reads 41 at spot id 91070, maximum 32 supported, skipped
2016-10-04T20:28:35 fastq-dump.2.6.3 warn: too many reads 117 at spot id 104330, maximum 32 supported, skipped
2016-10-04T20:28:37 fastq-dump.2.6.3 warn: too many reads 37 at spot id 109128, maximum 32 supported, skipped
2016-10-04T20:28:49 fastq-dump.2.6.3 warn: too many reads 47 at spot id 125262, maximum 32 supported, skipped
2016-10-04T20:28:51 fastq-dump.2.6.3 warn: too many reads 99 at spot id 129723, maximum 32 supported, skipped
2016-10-04T20:28:54 fastq-dump.2.6.3 warn: too many reads 41 at spot id 132767, maximum 32 supported, skipped
2016-10-04T20:29:01 fastq-dump.2.6.3 warn: too many reads 55 at spot id 138644, maximum 32 supported, skipped
Rejected 12 SPOTS because of to many READS
Read 163482 spots for SRR1981108
Written 163467 spots for SRR1981108
kwc17@Bioinftop2016-05 ~/Desktop/new $ ls
SRR1981108_1.fasta SRR1981108_19.fasta SRR1981108_3.fasta
SRR1981108_10.fasta SRR1981108_2.fasta SRR1981108_4.fasta
SRR1981108_11.fasta SRR1981108_20.fasta SRR1981108_5.fasta
SRR1981108_12.fasta SRR1981108_21.fasta SRR1981108_6.fasta
SRR1981108_13.fasta SRR1981108_22.fasta SRR1981108_7.fasta
SRR1981108_14.fasta SRR1981108_23.fasta SRR1981108_8.fasta
SRR1981108_15.fasta SRR1981108_24.fasta SRR1981108_9.fasta
SRR1981108_16.fasta SRR1981108_25.fasta SRR1981235_1.fasta
SRR1981108_17.fasta SRR1981108_26.fasta SRR1981235_2.fasta
SRR1981108_18.fasta SRR1981108_27.fasta
You get 28 tiny output files Instead of 2 (see the original SRR1981235_1.fasta + SRR1981235_2.fasta)
I don't know how to handle this priblem...apart from removing this SRR from the data set of course. Does it have to be included?
I have question. When I used --split-3 rather than -split-files, I got three output. When I looked for manual of fastqdump it says that the reads satisfying biological condition are put as _1.fasta and _2.fasta. And if only one read present it is given as fasta. Here I do not understand for what fastq is considering as biological condition. Can anyone helo me for this please?
Try
--split-3
rather than-split-files
.1) What program are you using to do the alignment? tophat?
2) Did you do any trimming or QC between downloading the fastq files and alignment?
you may not have downloaded the files completely can you run:
to give you the line count?
Hi,
I had this issue as well.It happens all the time with SRA toolkit while downloading the sra files and converting them to fastq. Sometimes you need to redownload the SRA files all over again.This should work.