I created a combined MHV-A59 and mm10 fasta and GTF file using the linux cat command.
The last two entries of the mm10 and first two of the A59 of the combined GTF looks like this:
I then made a reference with this combined fasta and GTF file using STAR and aligned to my samples. The BAM output files show the reference contains information from the MHV-A59 genome which it recognizes as an extra chromosome. Here is the last 4 entries from the Log.out file that lists the chromosomes. NC_001846.1 is the A59 genome and has 31357 bp:
58 NT_166452.1 20208 2737307648
59 NT_187064.1 114452 2737569792
60 NW_023337853.1 31129 2737831936
61 NC_005089.1 16299 2738094080
62 NC_001846.1 31357 2738356224
When I generate the counts table, all of mouse gene names are in the table but none of the A59 names are like "N" and "ORF1ab"
Thanks for any suggestions you can provide!
Hello I was curious to follow up and wonder if anyone knew why the viral genes are not showing up in the count matrix? I can't seem to solve this one
Try taking out the additional header lines from the combined GTF file.
NC_005089.1
lines should be immediately followed byNC_001846.1
lines.I figured it out! The GTF.featureType uses 'exon' by default which is present in the mouse GTF but the viral GTF file has no exon feature name. The quickest fix was to change the viral 'CDS' feature to 'exon' (only 6 edits at the end of the document) and re-run featureCounts