PCR duplicates in FFPE RNASeq
0
0
Entering edit mode
2.5 years ago
Gama313 ▴ 130

Dear all,

I am working on 100 RNASeq data generated with a stranded protocol and a Novaseq run.

I need to perform variant calling on these samples, however I am facing some problem.

I have not access to DNA so exome/targeted amplification is not possible.

For variant calling ,it is usually suggested a first step of marking duplicates, which I performed with picard MarkDuplicates (considering both lanes and distance for optical duplicates).

Said that, I think that the duplicates recognition could be affected by sample degradation. In particular, I suspect that FFPE degradation could limit the RNA regions amplified 'falsly' resulting in higher PCR duplicates. Is this assumption correct?

Moreover, I am wondering whether the duplication rate of a particular gene could be used as a metric to give more/less confidence to specific variants.

Regards

PCR-duplicates RNA-Seq FFPE Variant-Calling • 858 views
ADD COMMENT
0
Entering edit mode

Said that, I think that the duplicates recognition could be affected by sample degradation. In particular, I suspect that FFPE degradation could limit the RNA regions amplified 'falsly' resulting in higher PCR duplicates. Is this assumption correct?

That is certainly logical. Not much you can do about that.

I am wondering whether the duplication rate of a particular gene could be used as a metric to give more/less confidence to specific variants.

I doubt that. There is no specific reason why a particular gene can be used as a control.

You probably have no choice in the matter but consider limitations noted in Kevin's answer here : Inferring genotype based on RNA sequences (RNA-seq variant calling)

ADD REPLY
0
Entering edit mode

I forget to add that I called from NON-markduplicated reads since I think that I can calculate metrics (Mann-Whitney rank sum test) to discriminate systematic errors (e.i. drops in phred etc.). In this sense, I would use the duplication level of the specific gene to add more information to hard-filter variants.

For example: seen 30 times on a gene with a duplication level = 30% is more reliable respect to a variable seen 30 times on a gene with a duplication level = 70%. Does it makes sente?

Thanks again for you willingness

ADD REPLY
0
Entering edit mode

It may come down to what you want to do with the results and their ultimate application. As you are well aware FFPE samples are compromised because of the nature of the input material so any conclusions may need to be independently confirmed at a minimum. I am not a statistician so can't comment on your approach. Hopefully someone else will.

ADD REPLY

Login before adding your answer.

Traffic: 1961 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6