Hello everyone,
Nowadays I am using bowtie to align some small sequences (27 - 31 nucleotides) to a reference sequence. When I write:
grep "^>" Sequences.fa -c
52772
I get 52772 sequences in Sequences.fa However after mapping to the reference sequence and extracting all the reads I obtain a file with fewer reads:
grep "^>" Sequences_after_mapping.fa -c
52503
The commands that I wrote to align Sequences.fa was the following:
bowtie -f -a --best --strata -v 0 Inputs/Reference_sequence -f Sequences.fa --sam Sequences.sam
samtools view -Sb Sequences.sam > Sequences.bam
rm -rf Sequences.sam
samtools bam2fq Sequences.bam | seqtk seq -A - > Sequences_after_mapping.fa
And finally the report that I get from the script it's the following:
# reads processed: 52772
# reads with at least one reported alignment: 827 (1.57%)
# reads that failed to align: 51945 (98.43%)
Reported 827 alignments to 1 output stream(s)
[M::bam2fq_mainloop] discarded 0 singletons
[M::bam2fq_mainloop] processed 52772 reads
I am not able to find out what happened for getting fewer sequences. I would appreciate any help that you can bring me.
PD: The input file is a fasta file, my bowtie version is 1.1.2 and samtools 1.9.
PD2: If I remove the option --strata the number of sequences do not change.
ANSWER: Finally, I noticed that the fasta sequences in Sequences.fa should have different names for each sequence if not samtools extract whatever it wants. Thank you to everyone for all.
I think bowtie is not the issue because 827+51945=52772
To find what cause the problem, how many reads do you get from
samtools bam2fq input.bam > output.fastq
And did you tried:
samtools flagstat aln.sorted.bam
From
samtools bam2fq input.bam > output.fastq
I obtain 52503 reads. And here it's the output of samtools flagstat:How do you get to that 52503 number? what's the command you use to count them?
Then why are you using
samtools bam2fq
? Why notsamtools fasta
?Have you also looked at this:
Because I didn't know that tool. Anyway
samtools fasta
andsamtools fasta -F 0x900 in.bam > all_reads.fa
outputs the same number of reads (52503) that are less than the original fasta (52772).Are you missing reads that did not align from the output? You could try this option to see if you recover the reads in that file?
As gb said before I think that bowtie it is not the problem. Anyway I did what you said:
And outputs 52503 again.
I think you need to provide a file name for the
--un
option to save those reads. something like--un unmapped_reads.fa
. See if reads end up in that file.maybe it is this?
For
samtools fasta
http://www.htslib.org/doc/samtools.html