Question

Manually adding genes of interest to transcriptome for RNASeq DE?

0

Entering edit mode

2.2 years ago

CephBirk ▴ 20

I have been given some RNASeq reads by a collaborator and have been asked to assess whether there is differential expression between treatments in 6 genes of interest to my collaborator. Unfortunately, this is a non-model organism and so the best transcriptome available for me to work with only contains 2 of the 6 genes of interest. The other four are either missing entirely from the transcriptome or only small fragments.

Are there any issues with manually adding my six full-length sequences of interest to the transcriptome FASTA before running Salmon? If I do this, should I manually remove the fragments that may occur in the transcriptome already?

Is there a better strategy I should take? I imagine I could assemble my own transcriptome from the reads, but I suppose I wouldn't have any way of knowing my assembly would do any better job at capturing these full-length genes of interest than existing transcriptomes.

rnaseq de • 681 views

ADD COMMENT • link updated 2.2 years ago by benformatics 4.0k • written 2.2 years ago by CephBirk ▴ 20

1

Entering edit mode

Yes definitely remove the fragments that are already present in the current reference, especially if there is sequence overlap.

ADD REPLY • link 2.2 years ago by benformatics 4.0k

score 3 · Accepted Answer · 2022-09-22

In principle, if the sequences were indeed missing in the annotation, then adding them back would benefit the overall process. There would be no harm in that.

But you should ensure that these transcripts are valid: present in the sequencing data and not among the known transcripts you already have.

One could use blast to identify the most similar sequences relative to those you plan to add. Then you could align the reads to the missing sequences alone and view the resulting alignment in IVG to visualize potential coverage problems.

You would need full-length coverage as evidence that the transcripts were truly missing.