Question

Bed file filtering

0

Entering edit mode

8 months ago

kamila.temirkhanova • 0

Hello everyone!

I want to ask regarding bed file filtering.

So I have names of the genes I want to remove from my bed file. I am not interested in the length of the gene, start and end regions, I just want to fully remove all regions associated with the gene from my my gene_list.

So, for me, bed subtract option doesn't work.

I also tried to use grep but I guess there is a syntax error or something else, it didn't work (probably it is completely wrong grep -v --binary-files=text -f gene_list.txt my.bed > filtered_output.bed)...

I have gene_list.txt (file containing column of 174 gene names, which I want to remove from my bed file) and bed file itself (containing chr, start, end and gene name info)

Thank you!

bed • 489 views

ADD COMMENT • link updated 8 months ago by marco.barr ▴ 150 • written 8 months ago by kamila.temirkhanova • 0

score 0 · Answer 1 · 2024-03-12

0

Entering edit mode

8 months ago

Pierre Lindenbaum 164k

join -t '\t' -1 4 -2 1 -v 1 <(sort -t $'\t' -k4,4 in.bed) <(sort -t $'\t' -k1,1 gene_list.txt )

also use file check that both files are text files, NOT windows files with CLRF.

ADD COMMENT • link 8 months ago by Pierre Lindenbaum 164k

0

Entering edit mode

if you also work in R, you can try the anti_join function of the dplyr package. Here is an example that you need to adapt to your context:

library(dplyr) 

gene_list <- read.table("gene_list.txt", header = FALSE, stringsAsFactors = FALSE)
colnames(gene_list) <- "gene_name"

bed_file <- read.table("your_bed_file.bed", header = FALSE, stringsAsFactors = FALSE)
colnames(bed_file) <- c("chr", "start", "end", "gene_name")

filtered_bed <- bed_file %>% anti_join(gene_list, by = "gene_name")
write.table(filtered_bed, file = "filtered_bed_file.bed", sep = "\t", quote = FALSE, row.names = FALSE, col.names = FALSE)

ADD REPLY • link 8 months ago by marco.barr ▴ 150