STAR genome index with and/or with out *.gtf annotation
1
2
Entering edit mode
8.6 years ago
kirannbishwa01 ★ 1.6k

STAR needs genome file (*.fasta, fa) to create genome indexes. But, is it necessary to supplement the gtf annotation files, even though it works without it.

Details: I have a diploid genome and transcriptome database (made using reference genome, SNP/InDel polymorphism) of two different populations. The diploid genome is a single file but the population level transcriptome database aren't merged.

I think it can be merged but don't know of any consequences it may bring on the alignment. - Any suggestions??

If not the choice is just to create genome index and align the RNAseq data to it.

What difference does it make if you make the genome index with or without the gtf file?

STAR gtf RNA-Seq alignment bowtie • 16k views
ADD COMMENT
0
Entering edit mode

Hi Guys,

I have a quick doubt on the output of the Genome Indexing, I have used the STAR program along with genome .fasta file and GFF file.

Genome size is 3GB, here is the file output

chrLength.txt
chrNameLength.txt
chrName.txt
chrStart.txt
genomeParameters.txt

I have another small Genome 60MB in size, I did the genome indexing, here is the file output

chrLength.txt
chrNameLength.txt
chrName.txt
chrStart.txt
exonGeTrInfo.tab
exonInfo.tab
geneInfo.tab
Genome
genomeParameters.txt
SA
SAindex
sjdbInfo.txt
sjdbList.fromGTF.out.tab
sjdbList.out.tab
transcriptInfo.tab

My point here is that, why I got the extra information for my small genome size, but I didn't get the same for the big size genome. I do apply the same procedure for the both.

here is the below information. Only difference I made for the large Genome size is (--sjdbOverhang 99 \ --genomeChrBinNbits 15) to reduce the memory, but the rest of things are same for small genome.

#!/bin/bash
NUM_THREADS=12
mkdir DB
STAR --runMode genomeGenerate --genomeDir DB \
    --runThreadN $NUM_THREADS \
    --genomeFastaFiles XL9_2.fa \
    --sjdbGTFfile XENLA_Frog.gtf+gff3 \
    --sjdbOverhang 99 \
    --genomeChrBinNbits 15

Could anyone give an idea, why there is different, I am new to this field, so I am wondering about the difference in this.

Thanks in advance.

Cheer San

ADD REPLY
4
Entering edit mode
8.6 years ago

You don't need to provide the GTF file beforehand, however it's certainly convenient to do so if you know ahead of time that you'll be using it. The only consequence of doing it afterward is making the alignment take longer. If you omit the GTF file completely then you'll just get lower quality spliced alignments.

ADD COMMENT
0
Entering edit mode

I think, I should be able to merge two gtf files and do the alignment then. These are basically population level custom gtf generated by adding SNP/Indels to the reference gtf. The chromosome go by 1_P, 2_P, .......... for paternal strain, and 1_M, 2_M, 3_M for maternal strain.

I am hoping there wouldn't be a problem.

Thanks for the update.

ADD REPLY
1
Entering edit mode

If you aligned to a concatenated genome of the maternal and paternal strains then go ahead and merge the GTF files too.

ADD REPLY
0
Entering edit mode

Hi Devon, I used STAR for counting the reads using this function "--quantMode TranscriptomeSAM GenCounts" without GTF file neither for the annotation or read counting, is there anything that I should be concerned of?

Thank you

ADD REPLY
0
Entering edit mode

If the original index was made including a GTF then this should work fine (I've never tried it).

ADD REPLY
1
Entering edit mode

Problem solved, inside the genome directory there is the file genomeParameters.txt which contain an entry like sjdbGTFfile and gtf used...fewwww...

thanks anyways Devon

ADD REPLY
0
Entering edit mode

Do you know if there is a way to see whether or not the annotation file (GTF) was used to index the genome in STAR?

thanks

ADD REPLY

Login before adding your answer.

Traffic: 1992 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6