I want to get peaks of promoters of a list of genes using findMotifs.pl.
findMotifs.pl $GENE_LIST human $OUTPUT_DIRECTORY -p 4
But the list of genes I have was produced with ensembl gene annotation and GRCh38 genome files from ensembl, so I wanted to use a promoter set from ensembl as well. However, HOMER works with pre-built UCSC promoters (in .tsv) and genomes.
I tried to use a customary promoters list: Download with Biomart a list of promoters and then directly do:
findMotifs.pl $GENE_LIST $PROMOTERS_LIST $OUTPUT_DIRECTORY -p 4
And also tried to do:
loadPromoters.pl -name $PROMOTERS_LIST -org human -id ensembl -genome $GENOME
but it kept saying:
!!! Either -genome, -tss OR -fasta, -offset are required !!!
What is this '-tss' and is this important or should I be doing something completely different?
it gave this error:
Please paste the fully expanded command that you're using (i.e. with the values of the variable names) and state your current working directory. Thanks!
Does the directory /Applications/Anaconda/share/homer-4.9.1-6/.//data/promoters// exist?
You may have to install the promoters dataset, as outlined here: http://homer.ucsd.edu/homer/introduction/configure.html
It doesn't exist.
when installing HOMER with conda it seems it didn't install configureHomer.pl, which is needed to install the promoters datset.
I installed the promoters for mouse just now on my computer with:
The other command then completed.
You will have to do the same for your system.
ok, did that. Many thanks. Meanwhile I tried this command as it allowed to load the genome and generate the promoters file at the same time (with the option '-promoters'):
It run well, but then when running:
It gave an error:
Also when running the command you suggested:
it also gave the same error.
Do you know why?
Also noted that files like 'GRCh38.92.promoters.base' in folder 'data' of homer directory are empty while the corresponding folders of another genome 'human.base' have content inside...
Is it not supposed to be something like:
$GENE
is a 'gene list'; 'human' is your genome, for which you would have installed a promoter set, which is then used in this command.For example, my earlier command,
perl /Programs/Homer/configureHomer.pl -install mouse-p
, installs promoters for mouse, which I now have on my computer under data/promotersI tried as you said with name of genome instead of promoter set name and it didn't work either. Besides, from manual seems the correct is to put 'promoter set' and not 'genome name:
I cannot use the command with 'human' literally cause 'human' although it's automatically installed does not have promoters from ensembl, it has promoters annotated from UCSC and I need to work with annotation from ensembl. That's why all the work to generate a customary promoter set.
Okay, makes sense, have you checked that your custom files are in exactly the same format as the standard files for human? Even one incorrect piece of syntax could throw the errors. There is also some guidance here: http://homer.ucsd.edu/homer/introduction/update.html
Could you paste the header from GRCh38.92.promoters?
GRCh38.92.promoters is a folder in homer/data/promoters directory with these files inside:
header of GRCh38.92.promoters.base for example is:
the only file empty is GRCh38.92.promoters.base.gene, while human.base.gene is full:
I am working with .gtf and .fa files from ensembl so they should have right format.. the only odd thing is that the 'promoter files' above have ensembl transcript IDs and I would expect them to have ensembl gene IDs as this is what $GENE file should have according to their manual.
as this is different question I'll open another question
I have just successfully completed the series of commands using just chr5.
The genes in your gene list (
$GENE
) have to exactly match those in your GRCh38.92.promoters.base file. You appear to have a mixture of ENSG (Ensembl Gene) and ENST (Ensembl Transcript).When running
...you can select
-gid
or-tid
to instruct HOMER to pull ENSG (gene_id) or ENST (transcript_id) from the input GTFThe file PROMO content: