Question

Variant Calling scRNA-seq data for KRAS mutations

1

Entering edit mode

4.2 years ago

theodore.killian ▴ 30

I have sequenced scRNA-seq data and I would like to do variant calling for one gene (KRAS) so I can annotate clusters in the downstream analysis. I have looked at past posts and there doesn't seem to be a lot of consensus on what tools to use. However, I understand that I would do something like: 1) Align reads with STAR to generate a BAM file and subsequently generate a pileup file 2) Run the FreeBayes variant caller to find SNVs

Most of the tools and workflows for variant calling tend to focus on finding SNP in the entire genome, and I would only like to look at one specific gene, KRAS.

Another question I had, is what specific read depth is appropriate for variant calling only for one gene? (as opposed to the entire transcriptome). Would there be a difference?

rna-seq snp • 1.3k views

ADD COMMENT • link 4.2 years ago by theodore.killian ▴ 30

0

Entering edit mode

What single cell sequencing method did you use? Most single cell libraries are biased toward one end or the other of the transcript, you might only have a fraction of the transcript covered with reads.

ADD REPLY • link 4.2 years ago by swbarnes2 14k

0

Entering edit mode

We use 10X. I was thinking of using FreeBayes on the aligned reads, although I know that the called SNPS are strongly contingent on the read depth.

ADD REPLY • link 4.2 years ago by theodore.killian ▴ 30

1

Entering edit mode

As the comment above already indicates, the ability to detect mutations will strongly depend on whether you actually managed to sequence the part of the KRAS gene that is typically mutated. Generally, there's not a lot wrong with your pipeline; I wouldn't stress about the variant caller before actually having looked at the BAM file. Even if your mutation happens to be in a region for which you managed to capture reads, the depth will most likely be on the low end per single cell (I would guess below ten reads), so making a mutation call will most likely simply depend on manual annotation (if you are looking for known mutations).

ADD REPLY • link 4.2 years ago by Friederike 9.0k

0

Entering edit mode

If this is cancer, most KRAS mutations (~80-90% of KRAS mutant tumors) occur at just one of two amino acids residues within the protein (G12X or G13X, X=any amino acid). So it would be entirely possible to have a manual component.

ADD REPLY • link 4.2 years ago by Collin ▴ 1000

0

Entering edit mode

But that means 3' biased sequencing will never cover those sites.

ADD REPLY • link 4.2 years ago by swbarnes2 14k

1

Entering edit mode

Yes, you are right. But there are also oncogenes that may have many mutations in the middle of a long protein and might not get any coverage for either strategy. Also illustrates why study design is important, as 5' biased sequencing might likely cover these sites.

ADD REPLY • link 4.2 years ago by Collin ▴ 1000