I don't know if you have expertise in NGS data analysis, so I will just briefly describe the steps of the pipeline you need to do. But feel free to ask details if you need.
You can detect mutations (SNV and indels) using the following steps:
1.You need to map the reads against reference genomes.
Since your viral and human reads are in the same file, you will need to separate these data from each other. To do this, you need to concatenate corresponding reference genomes (human+viral) and map your reads against this concatenated reference genome. Since viral and human genomes are (obviously) very different, you will separate human and viral reads. Suppose you did this step and obtained the bam files.
From here on you can do 2 things:
2a. As mentioned in the comments, if the viral genome is very small, you can use IGV software to visualize the bam files and basically manually inspect if there are any mutations in viral alignments. If you see that there are actually a lot of them or the viral genome is large and its not feasible to do manual checks, proceed with the next step.
2b. On these step you will need to call variants based on your bam files, using GATK software. It has a chapter of best practices when dealing with RNAseq data.
The easiest way would be to perform variant calling on the entire bam files, which will give you information about human variants as well. If you really need to do the analysis only of viral data, you will need to subset your bam files.
After applying GATK, you will get a VCF file, containing the variants.
NOTE: You have mentioned there are 30 mln reads per sample. With these sequencing depth you can "recover" highly and moderately expressed human genes, but lowly expressed genes probably would not get many reads. I don't know at which extent the viral genome is expressed, but after the step 1 just check how many reads mapped to viral genes. If the numbers will be very low, it wouldn't make much sense to continue the analysis.
Hope this helps
This is similar to dualRNAseq experimental setup. Let me first clarify a couple of "prerequisites":
Hi Sir. Thank you for your questions and offering help in this situation.
Yes, I am optimistic that the RNA-seq data contained the viral genome in it in that, after developing the PI cells, I run RT-PCR to confirm the presence of NP, HN and F genes of the virus prior to submitting the RNA for sequencing.
Yes, I did a triplicate experiments for each samples including control cells (non-PI cells).
The sequencing read is 30M
Please see my answer below.
My understanding is that TCGA was able to use RNA-seq to detect HPV infection in their papers (most likely in head and neck cancer paper), so I guess it is possible to detect viral genome from the RNAseq. If the NDV genome is very conserved, you might as well first align it to the NCBI viral reference and see if you hit anything first. If the NDV genome isn't too long, you can just scroll through the alignment graph using IGV and see if there are any obvious variants.
Thank you Sir adding to this discussion. Can you please share with me the paper where they use the RNA-seq data to detect HPV infections?. Thank you once again.
Hi Dr. Hovhannisyan. Thank you for this lengthy and comprehensive stepwise pipelines to follow for my stated problem above. I really appreciated your time, contribution and committed on this.
I am actually very new to NGS data analysis. Though I attended quite a number of workshops and seminars on NGS particularly the RNA-seq and currently being enrolled in BioStar online classes to learn more. However, I understand all the steps you mentioned here. I will try to do as you recommended. Will get back to you if I am stuck alone the way.
Indeed, this will really be very much helpful to many. I copied this your guide on MS word and kept in my system to serve as reference material. Glad to have you here in this platform.
Best regards