Question

Aligning reads from mouse samples that express 1 human gene

1

Entering edit mode

2.4 years ago

bompipi95 ▴ 170

Hi bioinformaticians!

I have a set of mouse samples genetically engineered to express a single human gene. I have performed alignment against the mouse genome with STAR, and am trying to find a way to recover those reads that were mapped to this single human gene.

My current thinking is to identify the unmapped reads from each sample and then realign them against the human reference genome, subsetted to include only the chromosomal region corresponding to this human gene of interest. I will also filter the GTF annotation file to include only the entry corresponding to this gene.

Any thoughts on the approach above and alternative suggestions are welcome!

realignment • 1.7k views

ADD COMMENT • link updated 8 months ago by swbarnes2 14k • written 2.4 years ago by bompipi95 ▴ 170

1

Entering edit mode

Hi! Your approach sounds reasonable, but if you want some alternatives, maybe worth taking a look to these previous two related posts: Extract uniquely mapped reads from one species and Tool to separate human and mouse rna seq reads

ADD REPLY • link 2.4 years ago by iraun 6.2k

0

Entering edit mode

Thank you for linking these helpful posts

ADD REPLY • link 2.4 years ago by bompipi95 ▴ 170

score 1 · Answer 1 · 2022-11-16

The problem with your approach is that given the sequence similarity between human and mouse, you will likely get reads of human origin that map to the mouse genome and vice versa. As an alternative to what's already proposed, if you know what human sequence was inserted and where it is in the mouse genome, you could edit your reference genome accordingly. This way you'd have a reference genome that match your samples genome.

score 1 · Answer 2 · 2022-11-16

1

Entering edit mode

2.4 years ago

swbarnes2 14k

You might not want to hear it, but the proper way to do this is to make a new reference of mouse + human gene, (with updated gtf) realign and recount.

Also, you might want to make sure that your method of gene counting is smart about handling reads that align to very similar things. FeatureCount, or HTSeq-count (which is what STAR uses) are not very smart. RSEM, or pseudo aligners like Kallisto or Salmon are. If you want to stick with STAR, STAR's transcriptome output is suitable for use with RSEM.

ADD COMMENT • link 2.4 years ago by swbarnes2 14k

0

Entering edit mode

How do you make a proper new reference of mouse + human. I added the human sequences to my mouse fasta and gtf. However, aligning reads to this custom reference I think because of sequence similarity, samples that do not have human sequence is showing alignment to human.

Also, can you elaborate on how gene counting can be smart. I was thinking that alignment is the major issue. As long as they are aligned, gene counting only task is counting reads.

ADD REPLY • link 8 months ago by Cny • 0

0

Entering edit mode

It's normal to have a little bit of noise alignments to the wrong genome.

Some software will just throw up its hands and give up on ambiguous reads, or will toss a coin. "Smarter" aligners will handle ambiguous reads in an intelligent manner. If 100 reads unambiguously align to the mouse version of a gene, and 1 aligns unambiguously to the human version, then if it sees a read that aligns equally well to both, it's going to either have a 99% chance of counting the read as mouse, or it will count the read as adding 0.01 to the human count, and 0.99 to the mouse count. It's a little more sophisticated than that, but that's the gist.

ADD REPLY • link 8 months ago by swbarnes2 14k