Question

Are there risks to use a GRCh38 gtf and a hg19 fa in alignment?

0

Entering edit mode

4.2 years ago

ddzhangzz ▴ 90

We have used STAR program for aligning RNA sequences against hg19 genome but I noticed the programmer has used a hg19 fasta file and a Gencode v30 (for GRCh38) gtf for annotation and counts. Is there any risks behind this alignment?

RNASeq • 1.2k views

ADD COMMENT • link updated 4.2 years ago by swbarnes2 15k • written 4.2 years ago by ddzhangzz ▴ 90

0

Entering edit mode

It's not a `risk' - it will likely screw up the experiment if you use totally different builds for two different steps.

ADD REPLY • link 4.2 years ago by 4galaxy77 2.9k

1

Entering edit mode

It is a different coordinate system, so I would upgrade "likely" to "almost certainly" screw up results.

ADD REPLY • link 4.2 years ago by ATpoint 89k

score 1 · Answer 1 · 2021-07-19

If your gtf and reference file don't match correctly, your assessment of how many reads align to genes will be off. If the chromosome names don't match between gtf and genome, no genes at all will be counted.

For instance, for some reason 10xGenomics makes their references based on ensembl genomes and gencode gtfs. But they have to do a few lines of finageling to make them work together.

https://support.10xgenomics.com/single-cell-gene-expression/software/release-notes/build#mm10_#{files.refdata_mm10.version}

It's far easier and far safer to do things right from the start. Get your genome and gtf from the same place.