Question

FIMO output that preserves padj values

0

Entering edit mode

6.5 years ago

rbronste ▴ 420

Hi I am generating a fasta file from a DESeq2 output in the following way:

bedtools getfasta -fi /son/genomes/mm10.fa -bed treatment_pw_padj.001_sorted.bed -fo treatment_pw_padj.001_sorted.fa

The above bed file has FDR information as a last column, I believe #12

I then run the following to search for instances of a specific motifs within these intervals:

fimo /son/MA0112.1.meme /son/treatment_pw_padj.001_sorted.fa

I get a list of located motif sites with p-values however I would like to preserve the DESeq2 FDR value located in the original bed file as well next to these newly filtered intervals, any ideas on how I could do that? Thank you.

ChIP-Seq motifs FIMO MEME DESeq2 • 1.8k views

ADD COMMENT • link 6.5 years ago by rbronste ▴ 420

2

Entering edit mode

Why don't you make a custom fasta header for every entry that contains the column information like:

>name|chr|start|end|padj|(...)
AGTAGTCA(...)

ADD REPLY • link 6.5 years ago by ATpoint 87k

0

Entering edit mode

I think your method is much better.

ADD REPLY • link 6.5 years ago by morovatunc ▴ 560

1

Entering edit mode

I like to keep things silmpe whenever I can ;-D

ADD REPLY • link 6.5 years ago by ATpoint 87k

0

Entering edit mode

What would be your preferred method of making the custom fasta header and is this something I can output from bedtools getfasta, so it maintains a column from the initial bed file Im making into a fasta? Thanks!

ADD REPLY • link 6.4 years ago by rbronste ▴ 420

0

Entering edit mode

I dont have your data at the moment therefore I cannot give explicit solution but I can show you the way

R has a function called merge.data.frame. We are gonna use this to merge your initial data with the latest fimo output. So In the end you will have BED + fimo merged together. (probably, you will have duplicated rows of that BED since fimo might find multiple motifs in the single binding region)

1) So use bedtools getfasta with -name option on it. Then run fimo.

2) In fimo's txt output, there should be sequence name column. Extract your that peak name as they will become identical as write-in your input bed.

3) then use merge.data.frame function(in R base) to merge those two and get a single data frame.

4) make proper filtration to remove unnecessary data. (well you can make this step after reading your files as well)

If you can send me a sample, I can code it as well.

ADD REPLY • link 6.5 years ago by morovatunc ▴ 560