Using hisat2
, I did RNA-seq mapping against the genome. I now wish to identify the missense, nonsense, and silent mutations. I also have a ".gff" file that contains CDS, genes, etc. I can do it using for example IGV-browser, load on IGV genome, alignment (.bam), annotation (.gff), and manually examine it. But how can I automate it if I have thousands of mutations? As a result I want to see something like a table:
gene | position | type |
---|---|---|
A1 | 1020 | missense |
A1 | 1040 | silent |
B2 | 2000 | silent |
Have you looked at the literature at all? The table you show represents the result of a fairly extensive pipeline consisting of several different tools, as described for instance by Variant analysis pipeline for accurate detection of genomic variants from transcriptome sequencing data (Adetunji et al., 2019), and the references listed therein, or related to that article as listed in the link. There are many ways to solve this problem, none of them are simple, but you should probably look over refs or tutorials so you can sketch out a plan.
I did not think it is too hard. I have already performed variant calling and I have
.vcf
file. I thought there is a way to parse all these files: annotation (.gff
), variants (.vcf
), alignment (.bam
), genome (.fasta
) to get the result I wanted.