Grange object for hi-c
1
0
Entering edit mode
4.7 years ago
Siegfried ▴ 10

I would like to ask how to establish a grange object with double seqnames columns as well as start and end columns for strore interaction data?

R genome • 1.0k views
ADD COMMENT
0
Entering edit mode

Please be more specific by giving example data and more context. What files do you have and how is the output supposed to look like?

ADD REPLY
0
Entering edit mode
4.7 years ago

Assuming you are using R then you could use GenomicAlignments.

Example:

## import your reads
gr <- readGAlignmentPairs('your_hic_reads.bam')

## look at them
head(gr)
GAlignmentPairs object with 99420 pairs, strandMode=1, and 0 metadata columns:
            seqnames strand   :      ranges  --            ranges
               <Rle>  <Rle>   :   <IRanges>  --         <IRanges>
        [1]    chr2      *   :   5035-5184  --       32376-32457
        [2]     <NA>      *   :   5089-5122  --   2700963-2701112

## here you see that seqnames can either be a single value or <NA>. If it's a single value then the two reads are on the same chromosome, if it's NA then they are on different chromosomes

## let's subset and look at the different chrom
granges(gr[2],on.discordant.seqnames='split')
  [1]    chr1       55089-5122       +
  [2]    chr2       2700963-2701112       -

At this point you have would need to split the reads manually. The GRanges object cannot support two seqnames side-by-side (i.e. in "columns"), you could however convert each pairs of reads into a GRanges [see below] object and then merge all reads into a giant GRangesList. For the seqnames() call to return the two different chromosomes, you need to convert the alignment object to GRanges with on.discordant.seqnames='split'. Then we build a final data.frame to your specifications [note we no longer have GRanges]. Here is an example but it's not IMO the best way to do it - so I won't include suggest it.

## pull example discordant read
b <- gr[2]

## duplicate the read and then split them the two reads (R1 and R2)
bb <- granges(c(b,b),on.discordant.seqnames='split')

## remember which read is which
c('read1','read2') -> bb$read

## merge into giant data.frame
b.df <- cbind(data.frame(subset(bb,read == 'read1')),data.frame(subset(bb,read == 'read2')))

## check it
head(b.df)
  seqnames start  end width strand  read seqnames   start     end width strand
1    chr1  5089 5122    34      + read1    chr2 2700963 2701112   150      -
2    chr1  5089 5122    34      + read1    chr2 2700963 2701112   150      -
   read
1 read2
2 read2
ADD COMMENT

Login before adding your answer.

Traffic: 2045 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6