Entering edit mode
9.1 years ago
Ron
★
1.2k
Hi,
I have mouse reads from WGS ,and I want to look where are they come from in the genome.
for e.g how many map to 3' UTR ,transposons etc .I have processed the fastq to BAM's using STAR,so was thinking to look at IGV as well.
Any suggestions ?Also what kind of analysis can be done from these reads?
Thanks,
Ron
This question is similar to saying:
Hey everyone I have heard about this thing called mathematics. So what kind of studies can I do with math?
Thanks a lot Istvan for drawing a nice parallel.
This is a very broad question Ron, so to provide a more helpful answer you might have to try and focus in on a particular sort of analysis that you want to do.
Definitely start by looking at the data in IGV. This will give you an idea of what the data feels like, how it's distributed, where its unusual, etc. My old professor used to tell us "When you lack a good scientific question, don't go looking for one. Lay back and let the data run through your hands like grains of sand on the beach. Only then can you come up with a good biological question."
Of could, if you just want to know how many map to the UTRs, for example, I can tell you how to do that - but if you are brand new to WGS analysis then the first thing you should know is that questions which sound clear in a conversation can often be far too ambitious in bioinformatics. For example, do you want the raw read counts that fall into UTRs, or do you want to compare this value to other values, requiring some kind of normalisation. For example, % of reads that fall into UTRs (normalised for total reads) or, one step further, % of reads that fall into the % of genome that makes up 3' UTRs (doing the same to 5' UTRs allows for a comparison not bias by the fact that 5' UTRs may be longer than 3' UTRs).
And that's not even the how. Thats still the what. The how is how much of a read needs to overlap a UTR before it counts. Maybe you dont care about reads at all and do everything on a base-pair level. Maybe you have to filter your low-quality reads out first. Maybe you have to correct for sequencing bias and mappability.
Probably a good starting point is calling SNPs, in which case you should check out the GATK pipeline :) All the best!
Yes,I am brand new to WGS,I have done RNA seq analysis and WES in the past.At this point of time I only want to look where the reads are falling on which parts of genes?Perhaps I am looking at IGV,and as you said the read counts (how many map to UTR's etc).This is just a continuation of the analysis that i have been doing which can be seen in this post.
For RNA seq I have done similar analysis for checking the expression of genes from mouse and what kinds of cells are they expressed in (T-cells,endothelial cells,non endothelial cells,fibroblasts etc).This is WGS from mouse,so at this point of time I am exploring what studies have been done in this regard.