Do these reads hold any value? Are there software tools that make use of them? Just curious - because I was having trouble using liftover when they are present...
Do these reads hold any value? Are there software tools that make use of them? Just curious - because I was having trouble using liftover when they are present...
I don't see any danger in removing them for liftOver (indeed it may be necessary to do it). They might be of interest in some rare cases such as evaluation of mapping software, characterizing multimapping reads etc - but as for myself, I have never really used them.
it really depends on what your bed file is used for. for instance, we use them to define regions of interest, which are used for defining the sequencing experiment and also through the downstream analysis to focus in particular regions of our bam alignments and perform operations on such particular loci. although we may end up after alignment with reads that would have aligned to chrUn* or chr*_random, since the knowledge we aim to obtain is not to be contained on such contigs, we do not consider them when ultimately reporting variants or coverage. we still leave those reads on the bam file in case a new aligner is able to position them somewhere else, or if any particular experiment would be interested in looking to that contigs in particular, but since this is not the daily usage that's the main reason why we never include them in our experiment designs' bed files.
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
I normally remove them after I am done with the alignment of reads. But I keep them for alignment.
I would also support keeping them for alignment. The reason for me is that if I don't, I might get extra reads mapping to the main chromosomes when they would otherwise have mapped to the rermoved sequences. However, for certain operations like peak calling in DNA-seq, I remove them as I think they can mess up certain normalization procedures.
Keeping reads that map to chrM, chrUn, etc certainly have a detrimental effect on MACS (ChIP-seq peak caller) statistics.