Hi all,
I have a very short question and I would appreciate any input if my intuition is correct or if there is a better approach.
I am currently analysing RNA seq data from drosophila. The strain which we profile also expresses GFP in stem cells via the GAL4 system.
When generating count tables (I plan on using salmon
followed by tximport
), GFP and GAL4 are not considered, because they obviously do not occure in the reference cDNA that I downloaded from ensembl (cDNA sequence). However, this information would be very helpful for us, because it correlates with the number of stem cells.
So my question is, if it is correct to simply "manually" add the sequence of GFP and GAL4 at the end of the cDNA reference transcriptome and then use this modified transcriptome as the input for salmon? That way, reads should map to GFP and GAL4 sequence and therefore be correctly quantified.
Any feedback is much appreciated!
Cheers!
Yes. Add whatever sequences you suspect should be there. See if you get counts where you expect them and not where you do not (i.e. a negative control).