Hello! Thanks for taking the time to read. I'm very new to RNA-Seq, and have been banging my head against a wall for a few weeks now trying to analyze my data.
We performed RIP-Seq in our lab, with IPs against 3 different antibodies (Antibody-A, Antibody-B, and an IgG negative control). Each sample also had three biological replicates. The IP'd samples were processed for paired-end RNA-Seq.
Our goal is to determine the transcripts that were pulled down with both Antibody-A and Antibody-B, but NOT the IgG.
Currently, I have assembled the paired end reads (using a Trinity-based program, Agalma). I am now stuck at postassembly and data analysis. I have tried a few different approaches but I have not had much luck with any.
The main issue is that I am working in Xenopus laevis (oocytes), for which only an incomplete draft genome exists. The closest well-sequenced cousin is Xenopus tropicalis.
Initially, I performed a postassembly in the Agalma program in which transcripts were annotated against a subset swissprot database (based on the GI numbers associated with Xenopus laevis). This postassembly step also used RSEM to give FPKM values. The problem I encountered was that a control transcript seemed to have much higher FPKM values in our negative IgG controls than in our experimental IPs (in all replicates). This is unusual as we KNOW that this transcript should be abundantly expressed in the Antibody-A and Antibody-B IPs.
I thought the issue might be comparing FPKM across samples, and that a different approach would be better. I did some reading and EdgeR seemed like the program I wanted to use for differential expression across samples.
Following the edgeR manual, I first went to setup a table of read counts, using the featureCounts function of the Subread package. This program takes a BAM file and assigns mapped reads to genomic features in a GTF file. The output gives read counts for each gene. Since I could not find a gene annotation file (GTF) for Xenopus laevis, I used one for the related species, tropicalis. Unfortunately, this resulted in 0 feature counts for each gene and no assigned reads. So, I couldn't move on to the edgeR analysis.
I think that the problem lies with referencing my samples to laevis sequences in one step and tropicalis sequences in another. I think it would be better if I had a GTF file for Xenopus laevis, but I'm not sure if this is possible.
What I have that might be useful is assembled reads for the laevis oocytes. In addition to RIP-Seq, I did straight RNA-seq on the laevis oocytes. I'm not sure if this data can somehow be used as a comparison or baseline for anything in the RIP-seq experiment.
Does anyone have any insight on what I am doing wrong, or if there's a better way to approach my question? Or if there's a way use my whole oocyte RNA-seq data to help with the RIP-seq data?
I'm sorry this was such a long read or if anything is confusing. If you read though it all, thank you so much. Any and all insight is very appreciated. I feel like I'm just hitting a giant wall here!
Thanks,
-Sam