How to create custom gtf annotation file?

0

Entering edit mode

5.9 years ago

John ▴ 270

Hi

I am using RSEM (with bowtie2) for alignment then gene count. Using Refseq Annotation (gff3), and genomic.fna reference Fasta file from NCBI. RSEM can convert gff3 to gtf file.

How can I subset the GTF file (or gff3 file) by gene a name. I want to extract the annotation (gtf) for particular gene and extract the gene sequence from reference Fasta file. Then I want to perform alignment.

This is especially to reduce time by avoiding aligning whole genome.

Thanks in anticipation.

RNA-Seq R genome • 3.9k views

ADD COMMENT • link updated 5.9 years ago by caggtaagtat ★ 1.9k • written 5.9 years ago by John ▴ 270

3

Entering edit mode

This could potentially force some reads to be aligned to your gene, which would have normally aligned somewhere else.

ADD REPLY • link 5.9 years ago by caggtaagtat ★ 1.9k

1

Entering edit mode

That's what happened. There are more reads than I expected.

ADD REPLY • link 5.9 years ago by John ▴ 270

0

Entering edit mode

You should not do that! Aligning to only your genes will bias the analysis as your RNASeq experiment reflect the entire transcriptome not just your gene.

ADD REPLY • link 5.9 years ago by Kristoffer Vitting-Seerup ★ 4.2k

0

Entering edit mode

Yes, just switch to pseudo-aligners if you want to increase the speed. That's sufficient for gene expression

ADD REPLY • link 5.9 years ago by caggtaagtat ★ 1.9k

1

Entering edit mode

Can't you just grep for the gene name of interest and redirect the output to a file? All the lines relevant to that gene should have the ID, and this would select and place all lines with the given gene id into a single file.

ADD REPLY • link 5.9 years ago by seidel 11k

1

Entering edit mode

If you are just interested in gene expression, you could speed up your analysis if you use pseudo-aligner like salmon, which are much faster than "real" aligner programms.

Or if you really need the nucleotide precise alignment, than I would use STAR, which is a little faster and has a higher fidelity.

Edit: I moved it into the comments, but I adressed the issue of running time, since the overall question was how to speed up the alignment process.