Hey all,
I used the following script to align my reads to a reference transcriptome (bowtie2), and calculate transcript abundance (RSEM).
$TRINITY_HOME/util/align_and_estimate_abundance.pl --transcripts location_of_data/assembly.fasta --seqType fq --est_method RSEM -aln_method bowtie2 --trinity_mode --thread_count 32 --output_dir location_of_output_dir --left left_reads.fq.gz --right right_reads.fq.gz --output_prefix sample_name
It all ran without any problems. However, when I tried to calculate the percentage of reads that mapped to the reference transcriptome, the output stated that 100% of reads were mapped which is unusual.
I used samtools flagstat to generate statistics for the output .bam file for each sample, and it said 100% of reads were mapped.
I then used the RSEM script rsem-plot-model.pl on the contents of the .stat output directory to generate statistics for the bowtie2 mapping. It said the same. 100% of reads mapped, with ~65% unique and 35 multi-aligned.
Does the .stat output directory and the output .bam file from bowtie2 only contain data for mapped reads? Is it possible that 100% of the reads did map, since the reference transcriptome was generated from those reads to begin with?
How else can I generate the statistic?
Cheers :-)
I've never seen an instance in which 100% of a nontrivial library mapped to anything. But when mapping to an assembly of those reads, it's possible. Normally that would mean you have excellent library-creation and read preprocessing procedures.
That said - you should inspect the command used in "align_and_estimate_abundance.pl". By default bowtie2 does output unmapped reads, but perhaps they are being suppressed or redirected.
Thanks Brian, I thought it was strange but technically not impossible. I'll have a look at the script :-)