Hello everybody,
I have samples that I have mapped onto two custom pseudo genomes using STAR, thus creating two SAM/BAM files per sample. A small number of reads may be mapped with numerous or abnormal splice sites, or soft clipped reads mapped to different locations due to the presence of homologous sequence. I want to locate the small number of reads that map to both genomes, but with different map coordinates. For that I used bamUtil diff function to do it. I cannot seem to understand what parameters to give and also the results are very confusing. Has anyone faced similarly with bamUtil? or does anyone know something better than bamUtil for this purpose?
EDIT: I did ran bamUtil diff using ./bam diff --in1 1.bam --in2 2.bam --noCigar --baseQual --onlyDiffs --out output_file
I am getting this empty base Qual in pairs means that the reads that mapped onto diferent locations have identical quality. I want to find out if there any reads with different mapped locations and different base Qual too. Any ways? Any grep/sed/awk or shell script?
Susmita
Are the two pseudo-genomes generally the same, with some differences, or are they completely different?
The reference genome is same acting as a template and the SNPs are being incorporated into it from two mouse strains to make two pseudo genomes.