Question

Empty network file with ARACNe-AP

0

Entering edit mode

4.5 years ago

ww22runner ▴ 60

Hello everyone,

I have a bulk RNA sequencing results from WT and KO mice for a particular gene and I have 28 samples - 14 WT and 14 KOs. I am trying to generate a network file from ARACNe-AP as input for Viper to study transcription factors that may be involved but I am running into the problem where I am getting an empty network file. Here are the commands I used:

> java -Xmx5G -jar dist/aracne.jar -e test/my_data/small_subset.txt  -o outputFolder --tfs test/my_data/small_subset_tf.txt --pvalue 1E-8 --seed 1 \
    --calculateThreshold

>  for i in {1..100}
    do
    java -Xmx5G -jar dist/aracne.jar -e test/my_data/small_subset.txt  -o outputFolder --tfs test/my_data/small_subset_tf.txt --pvalue 1E-8 --seed $i
    done

 > java -Xmx5G -jar dist/aracne.jar -o outputFolder --consolidate

My expression matrix looks like this where I have used NCBI gene Ids (mice) in the column gene.

gene    Sample1 Sample2 Sample3 Sample4 Sample5 ... Sample28
 216795 67  56  84  23  139

and my tf file looks like this and contains NCBI gene Ids for tfs in mice:

Any advice would be greatly appreciated, thank you!

ARACNe-AP RNA-Seq • 1.7k views

ADD COMMENT • link updated 4.5 years ago by bruce.moran ▴ 970 • written 4.5 years ago by ww22runner ▴ 60

1

Entering edit mode

Presume you have genes in the tf file that are also in the expression matrix and expressed?

What's in outputFolder after your calculateThreshold?

If you want to share files I can see if it runs here?

ADD REPLY • link 4.5 years ago by bruce.moran ▴ 970

0

Entering edit mode

Hi Bruce, thank you for your reply, you are right in that I do not have genes in my tf file that also are present in the expression matrix and this was the problem. Being new to this tool, I had tried reading papers such as (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5040167/pdf/nihms789775.pdf) but do not completely understand how it works. I had a gene expression matrix that contained a set of differentially expressed genes from the experiment and to generate the list of tfs in mice, I looked for this information online (lists that others had used in analysis etc.) . Is there a better way for me to generate a list of tfs if I am unsure of what pathways may be involved in the KO mice? May I also ask, how ARACNe-CP draws connections between genes and their regulatory counterparts in very basic terms (in particular how the overlap of genes in the 2 inputs is important)? Thank you!

ADD REPLY • link 4.5 years ago by ww22runner ▴ 60

1

Entering edit mode

I'd start by reading the original ARACNe paper

The basic premise is that genes act in networks, but defining the network is not as simple as geneA correlates with genesB, C and D, and therefore regulates them.

ARACNe looks for direct pairwise interaction (e.g. between geneA+B, geneB+C, etc.). This results in many false positives because if geneA regulates geneB, and geneB regulates geneC, you will probably think geneA regulates geneC.

To define interactions, the expression matrix values are used against the TF/target gene vs. all genes in the matrix. The measure of statistical dependence (i.e. likelihood of regulation through direct interaction) is the mutual information which is 0 for complete independence (no regulation).

This is why you need overlap of TF list and expression matrix. If the expression matrix has no values for the TF/targets, you cannot know what the MI is, and it is set to 0.

In terms of lists of TF/target, you can use whatever set you like. I use Biomart and screen using the Gene Ontology (GO) term GO:0003700, which is 'transcription factor activity, sequence-specific DNA binding', you can use others or take lists from databases. I also include the DE genes from my experiment, if there is a regulatory gene in there then that is of particular interest (doesn't have to be a TF to regulate other genes necessarily).

Hope that helps.

ADD REPLY • link 4.5 years ago by bruce.moran ▴ 970

0

Entering edit mode

Hi Bruce, that was extremely helpful, thank you so much!

ADD REPLY • link 4.5 years ago by ww22runner ▴ 60

0

Entering edit mode

I think the tf file has to have the same header as the first column of the expression matrix, so:

echo "gene" > test/my_data/small_subset_new_tf.txt
cat test/my_data/small_subset_tf.txt >> test/my_data/small_subset_new_tf.txt

ADD REPLY • link 4.5 years ago by bruce.moran ▴ 970

0

Entering edit mode

Thank you for your reply Bruce but unfortunately it still gives me an empty output and I see something like this:

DPI time elapsed: 0 sec
Edges removed by DPI:   0
Final Network size:     0
Total time elapsed: 0 sec
Bootstrapping input matrix 1 with 11782 genes and 28 samples
MI threshold file is present
Calculate network from: test/my_data/small_subset.txt
TFs processed: 0
Time elapsed for calculating MI: 0 sec

ADD REPLY • link 4.5 years ago by ww22runner ▴ 60