Bed file filtering
1
0
Entering edit mode
8 months ago

Hello everyone!

I want to ask regarding bed file filtering.

So I have names of the genes I want to remove from my bed file. I am not interested in the length of the gene, start and end regions, I just want to fully remove all regions associated with the gene from my my gene_list.

So, for me, bed subtract option doesn't work.

I also tried to use grep but I guess there is a syntax error or something else, it didn't work (probably it is completely wrong grep -v --binary-files=text -f gene_list.txt my.bed > filtered_output.bed)...

I have gene_list.txt (file containing column of 174 gene names, which I want to remove from my bed file) and bed file itself (containing chr, start, end and gene name info)

Thank you!

bed • 489 views
ADD COMMENT
0
Entering edit mode
8 months ago
join -t '\t' -1 4 -2 1 -v 1 <(sort -t $'\t' -k4,4 in.bed) <(sort -t $'\t' -k1,1 gene_list.txt )

also use file check that both files are text files, NOT windows files with CLRF.

ADD COMMENT
0
Entering edit mode

if you also work in R, you can try the anti_join function of the dplyr package. Here is an example that you need to adapt to your context:

library(dplyr) 

gene_list <- read.table("gene_list.txt", header = FALSE, stringsAsFactors = FALSE)
colnames(gene_list) <- "gene_name"

bed_file <- read.table("your_bed_file.bed", header = FALSE, stringsAsFactors = FALSE)
colnames(bed_file) <- c("chr", "start", "end", "gene_name")

filtered_bed <- bed_file %>% anti_join(gene_list, by = "gene_name")
write.table(filtered_bed, file = "filtered_bed_file.bed", sep = "\t", quote = FALSE, row.names = FALSE, col.names = FALSE)
ADD REPLY

Login before adding your answer.

Traffic: 1559 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6