RNASeq alignment and quantification for a transgenic mouse
1
2
Entering edit mode
6.2 years ago
viktorfeketa ▴ 30

I have RNASeq data coming from a transgenic mouse (where a single gene's coding sequence is replaced by the gene sequence from another organism). I need to quantify the expression of this transgene (get the number of aligned reads). It seems to me that the most comprehensive and accurate approach would be the following:

1) generate a custom reference genome (.fasta file), where the respective sequence is replaced with the new transgenic sequence 2) modify the entries for this gene in the gene annotation (.gtf) 3) use the modified reference genome and gene annotation to do the alignment and gene quantification.

Does that sound like the correct approach, or are there some issues that I don't see?

Also, I have problems with implementing my plan. For #1, I couldn't find a good tool to replace the sequence with another one in the .fasta file - could you recommend something? I am not proficient with python, so would prefer a ready-made tool.

Another concern that I have is that because the original and new sequences in the fasta file have different lengths, the whole annotation (or at least one chromosome) will be misaligned in relation to the modified reference genome. How can this be resolved?

Could anyone suggest other general approaches? Would it be more straightforward to use just a single gene sequence as a reference, and align the whole dataset to it? If yes, then what tool would you recommend?

rna-seq alignment transgenic mouse custom genome • 4.4k views
ADD COMMENT
0
Entering edit mode

I need to quantify the expression of this transgene (get the number of aligned reads).

Are you not interested in what happens to the rest of the mouse transcriptome (probably not, but good to confirm). The gene you replaced was a single copy (with no other genes that were similar in sequence) elsewhere in the genome? Was the replacement confirmed to be a single copy (or is that something you need to check)?

ADD REPLY
0
Entering edit mode

1) we are definitely interested in the whole transcriptome, but for this it seems to me that using the usual RNAseq pipeline with the original wild-type genome is sufficient - only the targeted gene will be affected, reads for all the other genes should be aligned/quantified normally - isn't that right? 2) the replacement was not confirmed as a single copy - this would be useful to check; how does that affect the approach?

ADD REPLY
1
Entering edit mode
6.2 years ago

What might be simpler, and work just about as well, is to leave the orignal genome intact, and just append the transgene as a new chromosome to the genome. Then remake the index, and add the new entry to the gff.

ADD COMMENT
0
Entering edit mode

The concern that I have with this approach is that because the sequences are conserved to a large degree (the gene is the ortholog from a different organism), couldn't that result in some reads aligning (with imperfect alignment) to the original gene, and all reads being split between the original gene and the transgene? Wouldn't that result in unreliable quantification?

ADD REPLY
0
Entering edit mode

where a single gene's coding sequence is replaced by the gene sequence from another organism

If that was the case (as noted in original question and there are no other copies of the gene elsewhere) why should there be any worry about following?

couldn't that result in some reads aligning (with imperfect alignment) to the original gene, and all reads being split between the original gene and the transgene?

ADD REPLY
0
Entering edit mode

In this case, the read counts for the transgene will include only a fraction of the total reads that actually came from the transgene (the rest of the reads aligning to the original gene) - thus, underestimated quantification. But now that I think about it, I could just add the read counts to both the transgene and the original gene, knowing that in reality the original gene sequence is not present in the genome of transgenic animal, and all these reads should theoretically come from the transgene. Would that be accurate do you think (adding counts)?

ADD REPLY
0
Entering edit mode

knowing that in reality the original gene sequence is not present in the genome of transgenic animal, and all these reads should theoretically come from the transgene.

If the replacement happened cleanly (was the replacement done in egg/sperm?, not sure if your data is from single cells or multiple) then your counts should only be from the transgene.

If replacement happened in some of the copies/cells then as @swbarnes2 suggested above you will need to create a hybrid genome. It would be tricky to assign counts especially if the replaced gene is largely identical to the original.

ADD REPLY
0
Entering edit mode

I have a similar issue to that presented above with the exception that no gene was replaced. Instead a human gene with no-mouse ortholog together with GFP under the same promoter.

How do I append and new chromosome to the genome to quantify expression?

Thanks!

ADD REPLY
0
Entering edit mode

Add the sequence of the gene at the end of the mouse genome as a new entry. Reindex the modified genome and then align to the new indexes.

Is the location of the insertion known?

ADD REPLY
0
Entering edit mode

Thanks, do you know of any detailed tutorial to do that?

I am working with RNAseq from GEO, there is no info about the location of the insertion.

ADD REPLY
0
Entering edit mode

Hi, genomax. Do you mean I can use something like cat mm10.fa newgene.fa > mm10_edited.fa then index the modified genome? But I am not sure if newgene.fa only contains the sequence of gene I inserted into mouse genome, or the entire exon?

Thanks

ADD REPLY
1
Entering edit mode

You can use only the portion that you expect to have been inserted or full gene. You will need to inspect the boundaries of alignments. Depending on sequence similarity you will still have issues with multi-mapping (unless you have SNP's you can count on etc).

ADD REPLY
0
Entering edit mode

Thank you:) So how can I inspect the boundaries?

Sorry for bothering you with these simple questions, I am very new here. And if you have any books or papers recommended,I'd appreciate it.

Thanks again!

ADD REPLY
0
Entering edit mode

You can do that using IGV (Broad Institute) to inspect the alignments. You will need to turn on an option to show soft-clipped bases (which are not shown by default) so you can see boundary of sequence that was inserted. You will need to check for chimeric alignments (where one read is aligned to genome and other to the gene that got inserted). All this would depend on sequence similarities (genome and the gene you are inserting).

ADD REPLY
0
Entering edit mode

Talking about IGV, I have another question. After HISAT2+samtools, I opened my sorted.bam in IGV. But there were only chromosome 11-15, my target was in chr16. I've got a little bit confused cause I can find genes from other chromosomes in gtf after Stringtie.

Thanks agaiin!

ADD REPLY

Login before adding your answer.

Traffic: 1128 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6