Hi,
I have a set of duplicates marked .bam files from STAR. I likely should work on variants on cancer models and cancer itself; In this paper https://www.nature.com/articles/s41467-018-05190-9 they doing what I am up to says
For quantifying the expression of SNVS, RNA-seq reads were mapped to the GRCh37_g1k genome assembly using STAR43. Following GATK’s38 best practice on ‘Calling variants in RNAseq’, readgroups were added and duplicates marked using Picard44. GATK’s SplitNCigarReads for trimming reads and assigning mapping qualities was applied. The sets of SNVs identified using Strelka on WGS data for all tumor or organoid were merged using vcftools45. Reads overlapping any SNV were counted using the ReadCountWalker in gatktools
Does this mean that they are calling variants from RNA-seq or they are relating variably expressed genes from RNA-seq to genes carrying variants from their WGS?
They have a list of SNVs from WGS data and then used RNA-Seq data, after running through GATK best practices, to quantify the reads overlapping SNVs.
Sorry, you mean they used RNA-seq data to have another list of SNV obtained by RNA-seq so they would have two lists of SNV from WGS and RNA-seq?
geek_y is correct (I have edited my answer). The literal interpretation is that they did the following:
They did not, in fact, call the variants from the RNA-seq data. If it needs clarifying, I know the corresponding author and can ask.
Sorry Kevin, totally you meant I don't need to call SNV from RNA-seq? I have already access to duplicates marked RNA-seq .bam files from STAR.
F : Sequence of operations for the analysis in paper is clearly stated in @Kevin's comment (and originally referred to by @geek_y).
SNV were already obtained from WGS (and not RNAseq) data (I assume in the paper you linked). Do you have WGS data in addition? Are you trying to follow the analysis strategy from that paper?
Thank you, I have matched RNAseq and WGS data for each patient both in .bam format
As the question is the same, I want to do the same on my own first hand data.
Well, you can do what you want with your data - we live in a free society. Geek_y and I are just helping you to understand what, exactly, the authors did. We do this outside of the understanding of what your own project is about.
Sorry Kevin please don't be angry I understand you are not dictating me to what I have to do at all
This is because of my bad English
By my last comment I just meant my understanding of your total understanding of the paragraph I pasted
I believe people ask their confusion in public forums and people with knowledge in those fields are free to answer or ignore them I remember when I joined biostars about 4 years ago, I used to ask very shallow questions as I used to asking yet, first @Pierre also used to challenge my questions and comments but after a while he left me with my own forever even for a comment I meant we can ignore people instead of harsh feedbacks However thank you for your time
Do not worry. Looking at the quoted paragraph from the manuscript, it is not 100% clear what they did. This is the fault of the journal editors and the authors. There is a small amount of doubt surrounding the final line:
They do not explicitly state that these reads are RNA-seq reads, however, one can infer that this is what they did.
Just to recap:
aligned RNA-seq reads to the 1000 Genomes version of the GRCh37 / hg19 reference genome using STAR. I have a link for this genome build in #Step 3, HERE. In doing this, they followed the steps as they outline in the methods, namely: adding readgroups, marking duplicates using Picard, and trimming reads / assigning mapping qualities with SplitNCigarReads from GATK.
count RNA-seq reads over each region in which there is a variant identified from WGS. For this, they used ReadCountWalker
You are doing very well, so, do not worry.
Thank you so much for bearing with me
Well, which manuscript is it?- thanks for adding the link to the manuscript. Be aware of the limitations of calling variants from RNA-seq: A: Inferring genotype based on RNA sequncesThank you, I got confused here because if they have called SNVs from RNA-seq why they are mentioning
Please edit your post to use a more useful/informative title. It doesn't help future users identify, at a glance, relevant content.