Question

Summing Chromosome Sizes

0

Entering edit mode

4.1 years ago

selplat21 ▴ 20

File1_Col1 <- c("Chr1", "Chr2", "Chr3", "Chr4", "Chr5")
File1_Col2 <- c(10000, 8000, 5000, 2000, 500)
File1 <- data.frame(File1_Col1, File1_Col2)
File2_Col1 <- c("Chr1", "Chr1", "Chr1", "Chr2", "Chr2", "Chr2","Chr3", "Chr3", "Chr3"
                ,"Chr4", "Chr4", "Chr4","Chr5", "Chr5", "Chr5")
File2_Col2 <- c(1,5,7,2,3,5,3,4,5,1,3,6,2,4,5)
File2 <- data.frame(File2_Col1, File2_Col2)

I have two files: File 1 contains chromosomes and their sizes and File 2 contains a list of SNPs for each chromosome.

I need to have the SNPs consecutive by position, so for example:

Chr3 SNP 3 should actually be 3+(the size of both preceding chromosomes) = 3+10000+8000= 18003

Can someone help me write a loop in R that will just sum the sizes of preceding chromosomes in File2_Col2?

R sequencing • 936 views

ADD COMMENT • link updated 4.1 years ago by rpolicastro 13k • written 4.1 years ago by selplat21 ▴ 20

1

Entering edit mode

What have you tried? You should look at calculating cumsum for the first data frame followed by a merge, then you can create a derived field that is the sum of the the Col2 field.

ADD REPLY • link 4.1 years ago by Ram 44k

0

Entering edit mode

Thank you so much that answers my question!

ADD REPLY • link 4.1 years ago by selplat21 ▴ 20

score 0 · Answer 1 · 2020-10-29

I'm not completely sure I understand, but here is a tidyverse answer of what I thought you meant, based on @RAmRS's comment.

library("tidyverse")

result <- File1 %>%
  mutate(cumsum_chr=cumsum(File1_Col2)) %>%
  right_join(File2, by=c("File1_Col1"="File2_Col1")) %>%
  mutate(newcol=cumsum_chr+File2_Col2) %>%
  select(!c(File1_Col2, cumsum_chr))

> result
   File1_Col1 File2_Col2 newcol
1        Chr1          1  10001
2        Chr1          5  10005
3        Chr1          7  10007
4        Chr2          2  18002
5        Chr2          3  18003
6        Chr2          5  18005
7        Chr3          3  23003
8        Chr3          4  23004
9        Chr3          5  23005
10       Chr4          1  25001
11       Chr4          3  25003
12       Chr4          6  25006
13       Chr5          2  25502
14       Chr5          4  25504
15       Chr5          5  25505