I have been given some RNASeq reads by a collaborator and have been asked to assess whether there is differential expression between treatments in 6 genes of interest to my collaborator. Unfortunately, this is a non-model organism and so the best transcriptome available for me to work with only contains 2 of the 6 genes of interest. The other four are either missing entirely from the transcriptome or only small fragments.
Are there any issues with manually adding my six full-length sequences of interest to the transcriptome FASTA before running Salmon? If I do this, should I manually remove the fragments that may occur in the transcriptome already?
Is there a better strategy I should take? I imagine I could assemble my own transcriptome from the reads, but I suppose I wouldn't have any way of knowing my assembly would do any better job at capturing these full-length genes of interest than existing transcriptomes.
Yes definitely remove the fragments that are already present in the current reference, especially if there is sequence overlap.