Hello,
I am trying to analyze my bulk RNAseq data set from hippocampal tissue extracted from our WT/KO mice. The knockout consists of a 10kb deletion in a single exon of our gene of interest. I want to look at differential gene expression of all genes in the dataset, but in addressing the genetic modification, I'm wondering if I should add an additional entries to the gtf annotation file (currently using gencode.vM31) or to the genome fasta (GRCm39) with the sequence of the modified exon/transcript, and if so, how to go about doing that? I'm worried the big change in length of the resulting transcript may skew the results of our gene of interest if its just assigned to the same gene/transcript ID.
I'd be curious how the chromosome coordinates would work with that... should the new exon/transcript be its own separate custom chromosome included in the genome fasta? If so should the gtf file contain just the info on the single exon and resulting transcript coordinates on that custom chromosome or should it include the info of all the remaining exons in the gene?
If including all unchanged exons how would the packages be able to differentiate between the resulting transcripts if those sequences are identical (except for the one modified exon)?
Please let me know if I should include any other helpful details of the experiment!