ChIP-Seq for viral genomes
0
1
Entering edit mode
6.0 years ago

I am trying to analyze ChIP-Seq data from a viral transcription factor. I am interested in identifying peaks in the viral genome specifically (~150 kb). I am having a hard time calling peaks from programs like MACS2 or Homer using default settings and just changing the genome size.

When I look at the alignment files in a genome browser such as IGV, I am able to see distinct peaks in the IP and not in the input control or a control where the transcription factor is not tagged. I am not sure if I need to play around with some settings or if there are specific ChIP analysis programs that cater to small genomes such as viral genomes.

While I don't get any peaks with MACS2, I do get peaks when I run HOMER and turn off all filtering or have a low fold change between control and the IP. Is it acceptable to manually curate a ChIP-Seq data? That is doable given the small genome size. Any help would be much appreciated!

ChIP-Seq next-gen • 1.7k views
ADD COMMENT
0
Entering edit mode

What is the nature of that sample? An organism where the virus is integrated into the genome?

ADD REPLY
0
Entering edit mode

The viral genome exists as episomes in the cell. It is not integrated in to the genome.

ADD REPLY
0
Entering edit mode

Not really answering your question, but I'd be concerned with cross contamination to the host (human?) genome. I would definitely do the short experiment of adding your virus genome to the human genome and then aligning all reads and redoing the MACS analysis.

I bet there are a few false positive peaks just because of the relative difference in genome sizes. Chip-seq always leads to peaks in my experience, I am really not overly convinced by the method.

ADD REPLY
0
Entering edit mode

I align to both the viral and host genome to avoid any mis alignments.. I then use only the viral aligned reads for peak analysis. Using the host+viral leads to a bunch of false positives from the host which look nothing like peaks on the genome browser and almost nothing from the virus which is where I actually see good peaks. From my understanding of peak calling algorithms, almost all of them with their default parameters are optimized for large genomes with widely spaced genes unlike a viral genome which is both small and dense.

I think the viral peaks are real because they are very distinct on the genome browser and could be validated by ChIP-qPCR as well. There is also consistency with transcript levels of targets etc.

ADD REPLY
0
Entering edit mode

I think your analysis is correct, and the tools are optimised for large genomes. Never had to do small genome chip-seq analysis, thankfully.

I think careful manual curation is appropriate in this case.

One thing that has really helped me in Chip-seq analysis is using the Mulitbamsummary tools then the plotFingerprint tool in Deeptools, available on the Freiburg Galaxy server among others. It's really good at assessing the chip strength.

Devon Ryan has written some good powerpoints on using deeptools for this purpose.

ADD REPLY
0
Entering edit mode

Thanks! I will check them out. I was also able to get some suggestions from the makers of HOMER to adjust the parameters for small genomes.

ADD REPLY

Login before adding your answer.

Traffic: 1531 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6