Question

Find overlaping sequences with pyranges from overlap

0

Entering edit mode

3.3 years ago

McClain • 0

I am trying to replicate the mergeByOverlap function from R BioConductor in python using the pyranges package. In R the code would be:

gr.snp <- with(gr.snp, GRanges(chr, IRanges(start, end),rsid=gr.snp$rsid))
snp.annotated <- data.frame(mergeByOverlaps(gr.snp, gencode, maxgap=2000, type="start"))

which returns:

nrow(snp.annotated)
[1] 34

colnames(snp.annotated)
[1] "gr.snp.seqnames"                  "gr.snp.start"                    
 [3] "gr.snp.end"                       "gr.snp.width"                    
 [5] "gr.snp.strand"                    "gr.snp.rsid"                     
 [7] "rsid"                             "gencode.seqnames"                
 [9] "gencode.start"                    "gencode.end"                     
[11] "gencode.width"                    "gencode.strand"                  
[13] "gencode.source"                   "gencode.type"                    
[15] "gencode.score"                    "gencode.phase"                   
[17] "gencode.ID"                       "gencode.gene_id"                 
[19] "gencode.gene_type"                "gencode.gene_name"               
[21] "gencode.level"                    "gencode.hgnc_id"                 
[23] "gencode.havana_gene"              "gencode.Parent"                  
[25] "gencode.transcript_id"            "gencode.transcript_type"         
[27] "gencode.transcript_name"          "gencode.transcript_support_level"
[29] "gencode.tag"                      "gencode.havana_transcript"       
[31] "gencode.exon_number"              "gencode.exon_id"                 
[33] "gencode.ont"                      "gencode.protein_id"

Where gr.snp is my snp file that I want the annotations for and gencode is the annotation file.

The closest I've gotten with python is with cluster or merge but they arent exactly right:

>>> temp = gr_snp.cluster(genecode_genes, slack = 2000)
>>> len(temp)
469
>>> temp.columns
Index(['Chromosome', 'Start', 'End', 'rsid', 'Strand', 'Cluster'], dtype='object')

Merge does basically the same thing but preserves even less metadata. I need the metadata from both tables to be preserved. Does anyone know how to do this?

mergebyoverlap python overlap r pyranges • 696 views

ADD COMMENT • link 3.3 years ago by McClain • 0