We have used STAR program for aligning RNA sequences against hg19 genome but I noticed the programmer has used a hg19 fasta file and a Gencode v30 (for GRCh38) gtf for annotation and counts. Is there any risks behind this alignment?
We have used STAR program for aligning RNA sequences against hg19 genome but I noticed the programmer has used a hg19 fasta file and a Gencode v30 (for GRCh38) gtf for annotation and counts. Is there any risks behind this alignment?
If your gtf and reference file don't match correctly, your assessment of how many reads align to genes will be off. If the chromosome names don't match between gtf and genome, no genes at all will be counted.
For instance, for some reason 10xGenomics makes their references based on ensembl genomes and gencode gtfs. But they have to do a few lines of finageling to make them work together.
It's far easier and far safer to do things right from the start. Get your genome and gtf from the same place.
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
It's not a `risk' - it will likely screw up the experiment if you use totally different builds for two different steps.
It is a different coordinate system, so I would upgrade "likely" to "almost certainly" screw up results.