Entering edit mode
3.6 years ago
greyman
▴
190
I have a list of gene of interest, a list with the position of coding sequences and a genome file (fasta from ncbi). While working with this non-model organism I was told that I need to trim away the sequence that code for a gene and retain the upstream sequence to look for TF binding sites. Is there any R package that does this trimming ? Or is this wrong? Some posts suggest using blastX, I know that it converts nt to protein sequence but I cannot figure out how it works, any help or any previous posts( that I missed) is appreciated, thank you.
You could use
bedops
to do set operations on proximal promoters (i.e. regions upstream of a TSS), but you'd need to create a coordinate system for defining gene positions in relation to coding sequence positions.If you have a non-model organism, which organism's fasta are you retrieving? What does your gene list look like, and what do your coding positions look like?
Once you have some regions of interest, it is usually easy to translate from there to fasta sequence, and to run that through MEME to discover novel motifs (or FIMO to predict putative sites, or TOMTOM to find similar motifs, once you have a table of motif weight matrices).