How to use a specific promoter set on findMotifs.pl of HOMER?
1
1
Entering edit mode
6.3 years ago
salamandra ▴ 550

I want to get peaks of promoters of a list of genes using findMotifs.pl.

findMotifs.pl $GENE_LIST human $OUTPUT_DIRECTORY -p 4

But the list of genes I have was produced with ensembl gene annotation and GRCh38 genome files from ensembl, so I wanted to use a promoter set from ensembl as well. However, HOMER works with pre-built UCSC promoters (in .tsv) and genomes.

I tried to use a customary promoters list: Download with Biomart a list of promoters and then directly do:

 findMotifs.pl $GENE_LIST $PROMOTERS_LIST $OUTPUT_DIRECTORY -p 4

And also tried to do:

loadPromoters.pl -name $PROMOTERS_LIST -org human -id ensembl -genome $GENOME

but it kept saying:

!!! Either -genome, -tss OR -fasta, -offset are required !!!

What is this '-tss' and is this important or should I be doing something completely different?

ChIP-Seq HOMER promoter set motifs • 5.3k views
ADD COMMENT
0
Entering edit mode
6.3 years ago

It means that you have to supply either or:

  1. -genome and -tss together
  2. -fasta and -offset together.

For example:

bin/loadPromoters.pl -name H3K27ac -org human -id refseq -genome mm10 -tss MyPeaks.txt

...or:

bin/loadPromoters.pl -name ChucksPromoters -org human -id refseq -fasta ChuckSequences.fasta -offset 2000

-tss relates to a peaks file

Kevin

ADD COMMENT
0
Entering edit mode

it gave this error:

!!! Could not open peak file: /Applications/Anaconda/share/homer-4.9.1-6/.//data/promoters//GATA3.pos !!!
    Extracting sequences from file: /Volumes/PereiraCytolab/Tania/genomes/ensembl/Homo_sapiens.GRCh38.dna.primary_assembly/Homo_sapiens.GRCh38.dna.primary_assembly.fa
    Looking for peak sequences in a single file (/Volumes/PereiraCytolab/Tania/genomes/ensembl/Homo_sapiens.GRCh38.dna.primary_assembly/Homo_sapiens.GRCh38.dna.primary_assembly.fa)
sh: /Applications/Anaconda/share/homer-4.9.1-6/.//data/promoters//GATA3.mask: No such file or directory
Could not open sequence file /Applications/Anaconda/share/homer-4.9.1-6/.//data/promoters//GATA3.mask
    Finding Redundant Promoters
    Max distance to merge: 500 bp
    Merging single peak file... 
Could not open peak file (/Applications/Anaconda/share/homer-4.9.1-6/.//data/promoters//GATA3.pos)
    nan% of 0 peaks were within 500 bp and merged
readline() on closed filehandle IN at /Applications/Anaconda/bin/annotateRelativePosition.pl line 36.
    Calculating CpG Content

!!! Couldn't open gc file for writing: /Applications/Anaconda/share/homer-4.9.1-6/.//data/promoters//GATA3.cgfreq !!!
    Calculating CpG Bins
sh: /Applications/Anaconda/share/homer-4.9.1-6/.//data/promoters//GATA3.cgbins: No such file or directory
sh: /Applications/Anaconda/share/homer-4.9.1-6/.//data/promoters//GATA3.gcbins: No such file or directory
    Creating base files (default background)
sh: /Applications/Anaconda/share/homer-4.9.1-6/.//data/promoters//GATA3.base: No such file or directory
sh: /Applications/Anaconda/share/homer-4.9.1-6/.//data/promoters//GATA3.base.gene: No such file or directory
ADD REPLY
0
Entering edit mode

Please paste the fully expanded command that you're using (i.e. with the values of the variable names) and state your current working directory. Thanks!

ADD REPLY
0
Entering edit mode

Does the directory /Applications/Anaconda/share/homer-4.9.1-6/.//data/promoters// exist?

You may have to install the promoters dataset, as outlined here: http://homer.ucsd.edu/homer/introduction/configure.html

ADD REPLY
0
Entering edit mode

It doesn't exist.

when installing HOMER with conda it seems it didn't install configureHomer.pl, which is needed to install the promoters datset.

ADD REPLY
1
Entering edit mode

I installed the promoters for mouse just now on my computer with:

perl /Programs/Homer/configureHomer.pl -install mouse-p

The other command then completed.

You will have to do the same for your system.

ADD REPLY
0
Entering edit mode

ok, did that. Many thanks. Meanwhile I tried this command as it allowed to load the genome and generate the promoters file at the same time (with the option '-promoters'):

loadGenome.pl -name GRCh38.92 -org human -fasta $GENOME -gtf $GTF -promoters GRCh38.92.promoters -version 92

It run well, but then when running:

findMotifs.pl $GENE GRCh38.92.promoters $OUT -p 4 -len 8,10,12

It gave an error:

!!! Something is wrong... are you sure you chose the right length for motif finding?
!!! i.e. also check your sequence file!!!
Use of uninitialized value in numeric gt (>) at /Applications/homer//bin/compareMotifs.pl line 1389.
    !!! Filtered out all motifs!!!
    Job finished

Also when running the command you suggested:

findMotifs.pl $GENE GRCh38.92.promoters $OUT -p 4

it also gave the same error.

Do you know why?

Also noted that files like 'GRCh38.92.promoters.base' in folder 'data' of homer directory are empty while the corresponding folders of another genome 'human.base' have content inside...

ADD REPLY
0
Entering edit mode

Is it not supposed to be something like:

findMotifs.pl $GENE human $OUT -p 4 -len 8,10,12

$GENE is a 'gene list'; 'human' is your genome, for which you would have installed a promoter set, which is then used in this command.

For example, my earlier command, perl /Programs/Homer/configureHomer.pl -install mouse-p, installs promoters for mouse, which I now have on my computer under data/promoters

ADD REPLY
0
Entering edit mode

I tried as you said with name of genome instead of promoter set name and it didn't work either. Besides, from manual seems the correct is to put 'promoter set' and not 'genome name:

findMotifs.pl <inputfile.txt> <promoter set> <output directory> [options]

I cannot use the command with 'human' literally cause 'human' although it's automatically installed does not have promoters from ensembl, it has promoters annotated from UCSC and I need to work with annotation from ensembl. That's why all the work to generate a customary promoter set.

ADD REPLY
0
Entering edit mode

Okay, makes sense, have you checked that your custom files are in exactly the same format as the standard files for human? Even one incorrect piece of syntax could throw the errors. There is also some guidance here: http://homer.ucsd.edu/homer/introduction/update.html

Could you paste the header from GRCh38.92.promoters?

ADD REPLY
0
Entering edit mode

GRCh38.92.promoters is a folder in homer/data/promoters directory with these files inside:

GRCh38.92.promoters.base    human.base.gene
GRCh38.92.promoters.base.gene   human.cgbins
GRCh38.92.promoters.cgbins  human.cgfreq
GRCh38.92.promoters.cgfreq  human.cons
GRCh38.92.promoters.gcbins  human.gcbins
GRCh38.92.promoters.mask    human.mask
GRCh38.92.promoters.pos     human.pos
GRCh38.92.promoters.redun   human.redun
GRCh38.92.promoters.seq     human.seq
human.base

header of GRCh38.92.promoters.base for example is:

ENST00000456328
ENST00000450305
ENST00000619216
ENST00000473358
ENST00000488147
ENST00000469289
ENST00000607096
ENST00000461467
ENST00000417324
ENST00000606857

the only file empty is GRCh38.92.promoters.base.gene, while human.base.gene is full:

head human.base.gene
100287102
103504738
102465910
102465909
102466751
653635
100302278
100422834
100422831
100422919

I am working with .gtf and .fa files from ensembl so they should have right format.. the only odd thing is that the 'promoter files' above have ensembl transcript IDs and I would expect them to have ensembl gene IDs as this is what $GENE file should have according to their manual.

ADD REPLY
0
Entering edit mode

as this is different question I'll open another question

ADD REPLY
1
Entering edit mode

I have just successfully completed the series of commands using just chr5.

The genes in your gene list ($GENE) have to exactly match those in your GRCh38.92.promoters.base file. You appear to have a mixture of ENSG (Ensembl Gene) and ENST (Ensembl Transcript).

When running

loadGenome.pl -name GRCh38.92 -org human -fasta $GENOME -gtf $GTF

...you can select -gid or -tid to instruct HOMER to pull ENSG (gene_id) or ENST (transcript_id) from the input GTF

ADD REPLY
0
Entering edit mode
PROMO=/Volumes/PereiraCytolab/Tania/annotations/promoters/ensembl/Homo_sapiens.GRCh38p12/Homo_sapiens.GRCh38.p12.txt
GENOME=/Volumes/PereiraCytolab/Tania/genomes/ensembl/Homo_sapiens.GRCh38.dna.primary_assembly/Homo_sapiens.GRCh38.dna.primary_assembly.fa

loadPromoters.pl -name GATA3 -org human -id ensembl -genome $GENOME -tss $PROMO

The file PROMO content:

Binding matrix  Chromosome/scaffold name    Start (bp)  End (bp)    Score   Feature Type    Strand
MA0597.1    14  23034888    23034896    7.391   THAP1   -1
MA0597.1    3   10026599    10026607    7.054   THAP1   1
MA0597.1    10  97879355    97879363    6.962   THAP1   -1
MA0597.1    3   51385016    51385024    7.382   THAP1   1
MA0597.1    16  20900537    20900545    6.962   THAP1   -1
ADD REPLY

Login before adding your answer.

Traffic: 1486 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6