Adding a gene to a GTF file
0
0
Entering edit mode
3.8 years ago

Hi, I did an experiment where I over expressed a human gene in mouse and now I am doing RNA-seq analysis. The gene has a mouse ortholog, which should be distinct enough such that I can get separate counts for each the human and mouse version of the gene. My issue is I have no idea how to successfully add the human gene into the mouse GTF file and run FeatureCounts on it. Here is what I've done so far:

1) Got the FASTA sequence for the human gene and added it to the bottom of my .fa file used for alignment. I made the header for that section say ">HUMANRH" (without the quotations).

2) Added this to the GTF file I'm using: HUMANRH unknown exon 1 17000 . + . gene_id "HUMANRH"; transcript_id "HUMANRH"; gene_name "HUMANRH"

3) Aligned all the fastqs as I usually would with STAR (just assume the variables have already been defined; although, now that I'm writing this question, do I now have to also remake the genome directory? Or is it just the FASTA and the GTF that need to be modified with the human gene?):

STAR --runThreadN 8 --genomeDir $genomeDir --sjdbGTFfile $GTF --sjdbOverhang 149 --bamRemoveDuplicatesType UniqueIdentical --readFilesIn $FASTQ1 $FASTQ2 --twopassMode Basic --outSAMtype BAM SortedByCoordinate Unsorted --quantMode TranscriptomeSAM GeneCounts --readFilesCommand zcat

4) Run feature counts with the updated GTF file as I usually do:

featureCounts -T 8 -t gene -s 0 -g gene_name -F GTF -a $GTF -o $OUTF/mouse.mainannot.HUMANRH.unstranded.gene.txt $BAM1 $BAM2 $BAM3 $BAM4 $BAM5 $BAM6

Now, for some reason, this new feature counts matrix has NEITHER the mouse or the human ortholog (and it originally had the mouse ortholog when I ran all this initially with just the mouse genome, with no addition of human genes at any stage). I'm confused about what's going on here, I did several Google searches to figure out the steps listed above, but it doesn't seem to be working. Any advice would be greatly appreciated!!!

RNA-Seq • 2.2k views
ADD COMMENT
0
Entering edit mode

Did you look at the alignments with IGV? Can you confirm that you have reads aligning to both locations?

ADD REPLY
0
Entering edit mode

Are the counts zero for both or are they simply not present at all?

ADD REPLY
0
Entering edit mode

Not present at all. Am I doing the process correctly at least?

ADD REPLY
0
Entering edit mode

Yes, the idea of how you do it is correct. Did you do that manually via a text editor or programatically? If the latter can you show the code?

ADD REPLY
0
Entering edit mode

I used vim, just went to the end of the file and pasted it in. I'm going to rerun it again though because I just realized there is something weird about the GTF file, it seems to only have a subset of chromosomes and I have no idea why. It still doesn't explain why the added-in human gene doesn't get counted with FeatureCounts, but maybe this will at least solve the issue of why the mouse ortholog isn't showing up. I'll update you after STAR is finished running, thank you!!!

ADD REPLY
0
Entering edit mode

Sorry for the long delay, tried several things. So I fixed the issue of not seeing the mouse gene in the count matrix, that was just something weird about the original GTF file. However the human gene WILL NOT show up in the counts matrix no matter what I do. I tried adding the code for the RH gene I described above, no luck, and then I went to the human GTF and just grabbed what it had for the "gene" entry verbatim, with the exception that the "HUMANRH" was the FASTA sequence I manually put into the .fa file (otherwise it would have said "chr2" there):

HUMANRNASEH1 HAVANA gene 1 16904 . - . gene_id "ENSG00000171865.5"; transcript_id "ENSG00000171865.5"; gene_type "protein_coding"; gene_status "KNOWN"; gene_name "RNASEH1"; transcript_type "protein_coding"; transcript_status "KNOWN"; transcript_name "RNASEH1"; level 2; havana_gene "OTTHUMG00000090279.2";

Didn't have luck doing that either, so then I went back and aligned my mouse fastq files to the human genome/GTF file, and yes, the human RNase H1 gene is significantly higher in the treatment samples as I would expect, with very little else actually aligned.

Why is it so difficult to combine mouse and human? Do I need to remake the genome directory? I'm so confused!

ADD REPLY
0
Entering edit mode

Do I need to remake the genome directory?

So the question is whether you have to build a new genome index? Yes, of course. How would the aligner know about these changes if not.

ADD REPLY
0
Entering edit mode

Wow, I'm super dumb, thank you - remaking the STAR directory now. I thought the updated GTF and FASTA file were sufficient, but clearly I need to go back and read the STAR manual in detail to understand the genome index.

Will update later if this works, which I suppose it will, in case anyone else ever does this... Thanks again ATpoint.

ADD REPLY

Login before adding your answer.

Traffic: 1985 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6