Entering edit mode
6.2 years ago
rbronste
▴
420
Hi I am generating a fasta file from a DESeq2 output in the following way:
bedtools getfasta -fi /son/genomes/mm10.fa -bed treatment_pw_padj.001_sorted.bed -fo treatment_pw_padj.001_sorted.fa
The above bed file has FDR information as a last column, I believe #12
I then run the following to search for instances of a specific motifs within these intervals:
fimo /son/MA0112.1.meme /son/treatment_pw_padj.001_sorted.fa
I get a list of located motif sites with p-values however I would like to preserve the DESeq2 FDR value located in the original bed file as well next to these newly filtered intervals, any ideas on how I could do that? Thank you.
Why don't you make a custom fasta header for every entry that contains the column information like:
I think your method is much better.
I like to keep things silmpe whenever I can ;-D
What would be your preferred method of making the custom fasta header and is this something I can output from bedtools getfasta, so it maintains a column from the initial bed file Im making into a fasta? Thanks!
I dont have your data at the moment therefore I cannot give explicit solution but I can show you the way
R has a function called merge.data.frame. We are gonna use this to merge your initial data with the latest fimo output. So In the end you will have BED + fimo merged together. (probably, you will have duplicated rows of that BED since fimo might find multiple motifs in the single binding region)
1) So use bedtools getfasta with
-name
option on it. Then run fimo.2) In fimo's txt output, there should be sequence name column. Extract your that peak name as they will become identical as write-in your input bed.
3) then use merge.data.frame function(in R base) to merge those two and get a single data frame.
4) make proper filtration to remove unnecessary data. (well you can make this step after reading your files as well)
If you can send me a sample, I can code it as well.