Selection of GTF file from gencode
1
0
Entering edit mode
9 months ago
Nodilan ▴ 10

I'm attempting to use STAR to index the mouse genome. I'm using the following command:

/opt/conda/envs/STAR/bin/STAR --runMode genomeGenerate --runThreadN 8 --genomeChrBinNbits 12 --limitGenomeGenerateRAM 60000000000 --genomeDir /desktop/output/mouse_genome_index/ --genomeFastaFiles /desktop/mouse_input_data/mouse_gencode_transcripts.fa --sjdbGTFfile /desktop/mouse_input_data/mouse_gencode_annotation.gtf --genomeSAsparseD 3

I downloaded the mouse genome FASTA and GTF files from the GENCODE website : https://www.gencodegenes.org/mouse/ I used the following GTF file enter image description here

and this fasta file: enter image description here However, I encountered an error that I'm having trouble understanding:

Fatal INPUT FILE error, no valid exon lines in the GTF file: /desktop/mouse_input_data/mouse_gencode_annotation.gtf Solution: check the formatting of the GTF file. One likely cause is the difference in chromosome naming between GTF and FASTA file.

it is related to the GTF file, but I don't know which GTF file I have to download from gencode in this case ( --sjdbGTFfile )

STAR • 921 views
ADD COMMENT
1
Entering edit mode

Please do not paste screenshots of plain text content, it is counterproductive. You can copy paste the content directly here (using the code formatting option shown below)

ADD REPLY
0
Entering edit mode
9 months ago

One likely cause is the difference in chromosome naming between GTF and FASTA file.

did you check that ?

ADD COMMENT
0
Entering edit mode

this is an example of line in my gtf file :

chr1 HAVANA gene 3143476 3144545 . + . gene_id "ENSMUSG00000102693.2"; gene_type "TEC"; gene_name "4933401J01Rik"; level 2; mgi_id "MGI:1918292"; havana_gene "OTTMUSG00000049935.1";

this is an example in my FASTA file having the same gene id

ENSMUST00000193812.2|ENSMUSG00000102693.2|OTTMUSG00000049935.1|OTTMUST00000127109.1|4933401J01Rik-201|4933401J01Rik|1070|TEC|

ADD REPLY
1
Entering edit mode

Fasta file should be the genomic sequence, not the sequence of the genes.

>chr1
ATCGTACGATGATCGTACGTAGCTAGTGAC....
ADD REPLY
0
Entering edit mode

thank you !!

ADD REPLY
0
Entering edit mode

If an answer was helpful, you should upvote it; if the answer resolved your question, you should mark it as accepted.

ADD REPLY

Login before adding your answer.

Traffic: 1797 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6