I am trying to map Illumina paired-end reads of the Bacillus subtilus natto genome to the reference genome. Everything is running smoothly, but I am only able to get 50 mapped reads at best. Here are my commands:
./fastq-dump --split-files ~/DRR000001.sra #Gives Forward and Reverse files
./fastq_quality_filter -Q33 -q 30 -p 70 -i ~/DRR000001_1.fastq -o ~/natto1_trim.fastq
./fastq_quality_filter -Q33 -q 30 -p 70 -i ~/DRR000001_2.fastq -o ~/natto2_trim.fastq
bowtie-build -f bsubtilisnatto.fasta natto
bowtie natto -1 natto1_trim.fastq -2 natto2_trim.fastq pair.sam --al pair.align --un pair.unmapped -l 20 -X 300 --fr -q -t --sam -n 3 --solexa-quals -p 8
The filter leaves about 700,000 and 400,000 reads. The reference fasta file is about 4 Mbp. This may be low, but surely it's not 50 mapped reads low. I've tinkered with the bowtie command, using --rf
, --ff
, lowering thresholds, omitting --solexa-quals
, all to no avail.
Your help would be much appreciated!
Does it mean that the reads in the two fastq file is not the same after filtering, i.e. the two files are no longer synchronized? If so, this is definitely wrong. If you remove a read from one fastq file you should remove that read also in the other file. Use an adapter and/or quality trimmer that can handle paired files, like cutadapt. Also, I wouldn't use bowtie, if anything bowtie2 or bwa mem (if read longer than 70bp).