I am quite new to STAR aligner, and have some confusion in the numbers of unmapped/mapped reads output from STAR:
I would like to know whether the STAR output BAM file if I do not use the argument (--outSAMunmapped within
) is already filtered for unmapped, or duplicate reads or not ? or do I need to further filter it before variant calling?
The short story is that:
Assuming my file is file.bam, I have run STAR without the argument (--outSAMunmapped within
), and I obtained BAM file. Looking into the log.final of that file produced % uniquely mapped reads 83.3%. If I look using the command
samtools view -c -f4 file.bam
it produced 0 reads, so unmapped also running samtools flagstat file.bam
generated this image
so no duplicate, no unmapped reads so clean file. When I rerun the alignment by adding --outSAMunmapped within
, and rerun samtools flagstat file.bam
,
I could get this image
so mapping appeared as a %, but still the duplicate is 0
Based on that I assume that
- the BAM file produced from STAR if one do not use argument
--outSAMunmapped within
is a file that contains only mapped reads (not sure whether these are unique mapped or ?), - if you add this argument, you get a BAM file that contain both mapped and unmapped but how about duplicate reads and mismatches.
Which statistics on the output bam file to be used in a paper or presentation?
Thanks my dear: just to make sure I understood correctly:
If I run STAR command without including
--outSAMunmapped
withoutoutFilterMultimapNmax
noroutFilterMismatchNmax
, so this is default STAR: I got statistics summary from thelog.final
file. my questions are:outFilterMultimapNmax
to 1?Please could someone expert in STAR put an end to my confusion by elaborately help me understating these points?
Thanks
Read the STAR paper and maybe start an email conversation with Alex Dobin, the developer of STAR.
Also, please put some more effort into formatting your posts.
@swbarnes2
Please do not paste screenshots of plain text content, it is counterproductive. You can copy paste the content directly here (using the code formatting option shown below), or use a GitHub Gist if the content volume exceeds allowed length here.