HI,
I work with meta-transcriptomics paired end data sequenced using Illumina hiseq technology.I performed denovo-assembly using Trinity on RNA-seq(separately on forward and reverse) data followed by mapping with bowtie2.Now,I am trying to count the reads that were mapped to each contig from the generated sam output file after mapping as it is very ambiguous with nearly 11,000,000 reads.
Just to avoid confusion I extracted only 2 columns from the sam output file that I am interested in
query contig
HWI-ST365:262:C0RY1ACXX:6:1114:13971:74078 comp59482_c0_seq1
HWI-ST365:262:C0RY1ACXX:6:2102:6548:15712 comp4933_c6_seq1
HWI-ST365:262:C0RY1ACXX:6:2102:6548:15712 comp5103_c0_seq1
HWI-ST365:262:C0RY1ACXX:6:2102:6548:15712 comp5696_c0_seq1
HWI-ST365:262:C0RY1ACXX:6:2102:6548:15712 comp5503_c0_seq1
HWI-ST365:262:C0RY1ACXX:6:2102:6548:15712 comp6262_c0_seq1
HWI-ST365:262:C0RY1ACXX:6:2102:6548:15712 comp5296_c0_seq1
HWI-ST365:262:C0RY1ACXX:6:2102:6548:15712 comp40032_c0_seq1
HWI-ST365:262:C0RY1ACXX:6:2102:6548:15712 comp4933_c6_seq4
HWI-ST365:262:C0RY1ACXX:6:2102:6548:15712 comp11733_c0_seq1
HWI-ST365:262:C0RY1ACXX:6:2102:6548:15712 comp5068_c0_seq1
HWI-ST365:262:C0RY1ACXX:6:2102:6548:15712 comp22661_c0_seq1
Thank you in advance
i think, that if the read is mapped twice to same contig, it will be counted two times
Well I was trying to go with the simplest solution. I thought bambus just wants to get some idea about the coverage on different chromosomes. Of course, there are many factors that need to be considered. The best way would be to convert the SAM to BAM and use flags to flag the reads that are non-uniquely mapped and duplicates and then process the unflagged reads.