Extending and clipping bed files
3
2
Entering edit mode
8.1 years ago
AB ▴ 360

Hi Everyone,

I have the data from a ChIP-Seq experiment. I used EDD to call the domains. I am interested in knowing out what genes are present just a few bp outside of the domains. I extended each of the peaks in the bed file I got from edd by 250 bp on both ends using bedtools slop. But what I need now is two rows for each feature of length 500bp centered around the ends. For instance,

Input

chr1    1000  2000
chr1     2500 3500

Output

chr1   750    1250
chr1   1750   2250
chr1    2250  2750
chr1    3250  3750

Using bedtools slop to extend and subtracting the old bed file from new file with extended coordinates will give me the 250bp region exactly outside the two ends of the peak. But I want the ends of the peak to be in the centre. How can I do that ?

Thanks

ChIP-Seq bedtools • 3.0k views
ADD COMMENT
3
Entering edit mode
8.1 years ago
Brice Sarver ★ 3.8k

A bed file is just a table, so you can do this quickly in R or the language of your choice.

shiftPosition <- function(record, size){
first <- c(record[1], as.numeric(record[2]) - size, as.numeric(record[2]) + size)
second <- c(record[1], as.numeric(record[3]) - size, as.numeric(record[3]) + size)
result <- c(first, second)
return(result)
}

bed <- read.table("your_bed_file.bed", header=FALSE, stringsAsFactors=FALSE)

final <- data.frame(matrix(apply(bed, 1, shiftPosition, 250), ncol=3, byrow=TRUE))

write.table(final, file="your_new_bed.bed", quote=FALSE, row.names=FALSE, col.names=FALSE)

This converts

    V1      V2      V3
1 chr1 3680371 3681212
2 chr1 4344252 4350391
3 chr1 4351710 4353125
4 chr1 4491516 4493706
5 chr1 4773006 4777948

to

1  chr1 3680121 3680621
2  chr1 3680962 3681462
3  chr1 4344002 4344502
4  chr1 4350141 4350641
5  chr1 4351460 4351960
6  chr1 4352875 4353375
7  chr1 4491266 4491766
8  chr1 4493456 4493956
9  chr1 4772756 4773256
10 chr1 4777698 4778198
ADD COMMENT
2
Entering edit mode
8.1 years ago

I'm not sure I understand , but if your run the following awk;

 awk -F '\t' '{printf("%s\t%s\t%s\n%s\t%s\t%s\n",$1,$2,$2,$1,$3,$3);}' input.bed

and pipe the output into bedtools slop, you should get the desired output.

ADD COMMENT
0
Entering edit mode
2.7 years ago
chunshi • 0

if [ $# -eq 0 ] || [ "$1" == "-h" ] || [ "$1" == "-help" ] || [ "$1" == "--help" ] then clear cat <<< "$description" exit 0 fi

left_pad=$2 if [ "$3" == "" ] then right_pad=$2 else right_pad=$3 fi

awk -F'\t' -v left="$left_pad" -v right="$right_pad" '{printf $1; printf "\t%s", $2-left; printf "\t%s", $3+right; for(i=4;i<=NF;i++){printf "\t%s", $i} printf "\n"}' $1

ADD COMMENT

Login before adding your answer.

Traffic: 1857 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6