I have overexpressed a gene in a vector construct, transduced with virus, transplanted it in a mouse, and then performed bulk rna seq on those cells (both transduced and untransduced) isolated from the mouse. However I am having doubts as to whether the observed gene expression values are truly representative (perhaps some is aligned to the endogenous version of the gene).
Is it possible for me align some flanking region (only about 20-50 basepairs on either side between the gene and the promoter(s)) to help me obtain a more accurate gene expression value (present in transduced but entirely absent in untransduced) (there is also no flag-tag in the vector construct itself). Or would such a region be spliced out? If possible, what would be the best method for me to do so (should I include promoter regions as well)?
Also, since if I were align the bam/sam files against some reference, does anyone have any code or tutorial that they could refer me to that would allow me to do so?
Thank you!
Was that region captured in your data?
Sorry, I am a bit new to this, but how would I determine this? through manually looking at the alignments (looking at IGV viewer for the regions around my gene of interest?)
Since you are doing RNAseq unless your RNA has those flanking bases they will not show up in your data.
Is there anything in the overexpressed construct (so that makes it into the final RNA transcript) that makes it unique from the endogenous trancripts?
I am not too sure, but my vector construct is essentially promoter-gene-promoter-GFP, along with some other features such as dGAG, RRE, WPRE, RU5, AmpR, TAT2e etc... Would any of these allow me to possibly do so? I believe my gene only has the protein coding regions, and no 5' or 3' UTR, but other than that is essentially the same as the endogenous version.
These viral and bacterial elements are vector-specific but are not transcribed. It is the gene, check if it is identical to the endogenous transcript or whether something makes it unique. If not then no way to distinguish it.
Thanks for the explanation and sorry for the naive questions, but what steps would you recommend I take to determine if there are actually any differences between my gene and the endogenous version (do I just compare the sequence directly through a BLAST)? If so, how would I go about then obtaining a read count from that information?