Could it be, that reads mapped to chromosome Y are in the PAR region, because the reference genome you haved used, hasn't masked this region? (A: Which human reference genome should I use?)
The well know PAR sequences finswimmer You forgot to share the wiki about it ;)
Create a sub bam of your reads mapped on chr Y then, try to visualize it under IGV for example. Look if you got some distinct areas where the reads mapped. Look in the litterature if these area are duplicated somewhere else in the mouse genome.
Typicaly, the PAR regions on chr Y. Another example, the AMELX gene on the chr X, you can find a copy of the gene on chr Y (AMELY) (https://ghr.nlm.nih.gov/gene/AMELX).
If you are sure you got only female mice, you can delete the chr Y from the reference
not a suprise as said Bastien Hervé and finswimmer there are some common regions between X and Y.
If you run samtools idxstats for 1000genomes/NA12878 ( female), you'll see that most reads are mapped on X but a few map on Y
$ find 1000G -name "*.bam" | while read F; do echo $F && samtools idxstats $F | grep -E '^(X|Y)' ;done
1000G/ftp/phase1/data/NA12878/exome_alignment/NA12878.mapped.illumina.mosaik.CEU.exome.20110411.bam
X 155270560 7340285 0
Y 59373566 27261 0
1000G/ftp/phase1/technical/other_exome_alignments/NA12878/exome_alignment/NA12878.mapped.ILLUMINA.BWA.CEU.exome.20110521.bam
X 155270560 4944705 0
Y 59373566 70046 0
1000G/ftp/phase3/data/NA12878/alignment/NA12878.mapped.ILLUMINA.bwa.CEU.low_coverage.20121211.bam
X 155270560 8783715 53818
Y 59373566 45922 5602
1000G/ftp/phase3/data/NA12878/exome_alignment/NA12878.mapped.ILLUMINA.bwa.CEU.exome.20121211.bam
X 155270560 10010067 126237
Y 59373566 50642 6249
1000G/ftp/phase3/data/NA12878/high_coverage_alignment/NA12878.mapped.ILLUMINA.bwa.CEU.high_coverage_pcr_free.20130906.bam
X 155270560 37264083 217222
Y 59373566 222821 1529
1000G/ftp/technical/working/20110915_CEUtrio_b37_decoy_alignment/CEUTrio.HiSeq.WEx.b37_decoy.NA12878.clean.dedup.recal.bam
X 155270560 10013253 78707
Y 59373566 27583 2190
1000G/ftp/technical/working/20110915_CEUtrio_b37_decoy_alignment/CEUTrio.HiSeq.WGS.b37_decoy.NA12878.clean.dedup.recal.bam
X 155270560 136857405 1852183
Y 59373566 703280 104728
1000G/ftp/technical/working/20120117_ceu_trio_b37_decoy/CEUTrio.HiSeq.WEx.b37_decoy.NA12878.clean.dedup.recal.20120117.bam
X 155270560 10013253 78707
Y 59373566 27583 2190
1000G/ftp/technical/working/20120117_ceu_trio_b37_decoy/CEUTrio.HiSeq.WGS.b37_decoy.NA12878.clean.dedup.recal.20120117.bam
X 155270560 136857405 1852184
Y 59373566 703280 104728
1000G/ftp/technical/working/20121016_exome_indel_seq_validation/NA12878.exome_indel_validation.HiSeq2000.20121016.bam
X 155270560 39360346 689280
Y 59373566 93256 7575
1000G/ftp/technical/working/20121016_exome_indel_seq_validation/NA12878.exome_indel_validation.MiSeq.20121016.bam
X 155270560 2209546 35513
Y 59373566 4800 369
1000G/ftp/technical/working/20121023_sga_dindel_evidence_bams/NA12878.chr20.ILLUMINA.sga_dindel_subset.CEU.evidence.20111114.bam
X 155270560 2241 0
Y 59373566 29 0
1000G/ftp/technical/working/20121126_NA12878_bam_downSampledTo5x/CEUTrio.HiSeq.WGS.b37_decoy.NA12878.clean.dedup.recal.downsampledTo5x.bam
X 155270560 8823168 104511
Y 59373566 46372 6866
1000G/ftp/technical/working/20131209_na12878_pacbio/si/NA12878.pacbio.bwa-mem.20131224.bam
X 155270560 1487159 0
Y 59373566 46425 0
1000G/ftp/technical/working/20131209_na12878_pacbio/si/NA12878.pacbio.bwa-sw.20140202.bam
X 155270560 1338206 0
Y 59373566 40394 0
Hello,
please have a look at How to add images to a Biostars post to include your pictures correct.
Could it be, that reads mapped to chromosome Y are in the PAR region, because the reference genome you haved used, hasn't masked this region? (A: Which human reference genome should I use?)
fin swimmer
I am using Mus_musculus.GRCm38.dna.primary_assembly from ensembl ftp://ftp.ensembl.org/pub/release-93/fasta/mus_musculus/dna/Mus_musculus.GRCm38.dna.primary_assembly.fa.gz
The well know PAR sequences finswimmer You forgot to share the wiki about it ;)
Create a sub bam of your reads mapped on chr Y then, try to visualize it under IGV for example. Look if you got some distinct areas where the reads mapped. Look in the litterature if these area are duplicated somewhere else in the mouse genome.
Could you please elaborate on what you mean with "distinct areas"?
Typicaly, the PAR regions on chr Y. Another example, the AMELX gene on the chr X, you can find a copy of the gene on chr Y (AMELY) (https://ghr.nlm.nih.gov/gene/AMELX).
If you are sure you got only female mice, you can delete the chr Y from the reference