Sorry for another post ... as I could not fit this message as a "comment" in my last post of alignment statistics:
I got the following from Picard CollectAligment and samtools flagstat ... I found flagstat result more informative than picard but no software allows me to fill the info in table at the end that I want. Sorry for my dumb questions. Any comments? Is there stand-alone perl or any other script that I can use? Thanks very much.
CATEGORY FIRSTOFPAIR SECONDOFPAIR PAIR
TOTAL_READS 269056355 269056355 538112710
PF_READS 269056355 269056355 538112710
PCTPFREADS 1 1 1
PFNOISEREADS 256 288 544
PFREADSALIGNED 0 0 0
PCTPFREADS_ALIGNED 0 0 0
PFALIGNEDBASES 0 0 0
PFHQALIGNED_READS 0 0 0
PFHQALIGNED_BASES 0 0 0
PFHQALIGNEDQ20BASES 0 0 0
PFHQMEDIAN_MISMATCHES 0 0 0
PFMISMATCHRATE 0 0 0
PFHQERROR_RATE 0 0 0
PFINDELRATE 0 0 0
MEANREADLENGTH 100 100 100
READSALIGNEDIN_PAIRS 0 0 0
PCTREADSALIGNEDINPAIRS 0 0 0
BAD_CYCLES 0 0 0
STRAND_BALANCE 0 0 0
PCT_CHIMERAS 0 0 0
PCT_ADAPTER 0.001418 0.00058 0.000999
I tried: samtools flagstat as well and I got:
538112710 + 0 in total (QC-passed reads + QC-failed reads)
0 + 0 duplicates
512844854 + 0 mapped (95.30%:-nan%)
538112710 + 0 paired in sequencing
269056355 + 0 read1
269056355 + 0 read2
501199328 + 0 properly paired (93.14%:-nan%)
509847920 + 0 with itself and mate mapped
2996934 + 0 singletons (0.56%:-nan%)
728423 + 0 with mate mapped to a different chr
568502 + 0 with mate mapped to a different chr (mapQ>=5)
Table that I want to fill is:
Total read = 538112710
Uniquely paired read#
Non-uniquely paired read#
Uniquely mapped read1#
Non-uniquely mapped read1#
Uniquely mapped read2#
Non-uniquely mapped read2#
Unmapped read1#
Unmapped read2#
Uniquely paired read%
Non-uniquely paired read%
Uniquely mapped read1%
Non-uniquely mapped read1%
Uniquely mapped read2%
Non-uniquely mapped read2%
Unmapped read1%
Unmapped read2%
What is your mapping program ?
have you tried Bioconductor or Biopython? If you are desperate consider placing the project up for bid on Guru or Elance.
Hi Jeremy, I looked into http://www.bioconductor.org/developers/new_packages/ and do not see a package mentioned that can read bam file and give me the metrics I am looking for. Am I sorry if I missed it. Any comments? My bam file are of the size of 60-70GB, would bioconductor be able to handle these files?
Hi Tony, I am using bwa for alignment and samtools for the rest of the pipeline.