I have a bed file from an iCLIP experiment that I needed to annotate (i.e. identify the genes) where the crosslinks are predicted.
I used bedtools intersect to annotate my crosslinks (i.e. the bed file) to the human gtf file (same one used for mapping/alignments) with parameters that would output the overlaps.
intersectBed -wa -wb -s -header -loj -a crosslinks.bed -b Homo_sapiens.GRCh38.90.chr_patch_hapl_scaff.gtf > annotation.bed
I used the -loj option to also output any crosslinks that were not found to overlap in the gtf file.
I found that about 60% of my crosslinks do not map to any genes/transcripts, that is, the crosslinks are mapping to areas outside gene annotations. For example, one crosslink is mapped at chromosome 10:62,304,527-62,304,528 but the gtf file only contains annotations at 10:62,289,521-62,304,033 and 10:62,350,006-62,350,297.
Is this common, or is it a potential problem in the algorithm that maps crosslinks for iCLIP data?
Any advise greatly appreciated. I could not find any information elsewhere on biostars.