I have 75x2 (20.77) million reads for Acinetobacter_baumannii sample . I aligned the reads to the reference genome (Acinetobacter_baumannii downloaded from NCBI which has chromosome and two plasmid sequences) with end to end option(bowtie2 program) and got 86.58% reads which aligned concordantly >1 times.PCR duplicates in the sample are 73%, I also aligned only the pcr good reads with the reference and got 70.91% reads which aligned concordantly >1 times.And i checked the genome for repeats and observed less repeats in the dot plot , so what could be the reason for multiple mapping ?
Moved to an answer since this is very likely the reason.
Thanks , it's just English is not my origin language and it's difficult to be sure to understand a question. I got a question for you ( i will remove this post after your answer ) what can i do respecting the community rules to actualize one of my question posted ? (which i posted a Friday that's hope i got no answer)
If you have relevant new information for that question you can edit your question, which will also bump it to the top of the list and get attention again. Just don't abuse this feature.
I am under the impression that pseudogenes are rare in bacteria, since they are under high pressure to keep a small genome...
86% of reads with multiple alignments seems incredibly high to be explained by pseudogenes/repeats/etc; I've never seen that in bacteria. I think there's a different mechanism at work here. Perhaps you could post your insert size distribution and coverage distribution? If you had super-short inserts so that most of the reads were only 14bp after adapter-trimming, that would explain the issue since you simply don't have enough information per read to map them correctly. Alternatively, if your coverage distribution indicates that most of your reads cover a small fraction of the genome (which might be repetitive), which is something that could theoretically happen with (for example) MDA-amplified single cells, then pseudogenes/repeats could be the correct explanation. However, I doubt that this is the correct explanation if you are using a randomly-fragmented library from an unamplified isolate.
It is also helpful to post the percentage of reads that map.
I checked for pseudo genes(homologous) but did not find them , could there be any other possible reason for it ?
How did you do that ? did you check the second mapping site for a read which map 2 times ?