Cannot Modify Fasta File description
1
0
Entering edit mode
2.1 years ago
bkffadia ▴ 10

I wanted to Generate genome indexes using a tool called STAR, this requires reference genome sequences (FASTA files) and annotations (GTF file), when I launched STAR a message popped up:

Fatal INPUT FILE error, no valid exon lines in the GTF file: /content/drive/MyDrive/gencode.v34.annotation.gtf
Solution: check the formatting of the GTF file. One likely cause is the difference in chromosome naming between GTF and FASTA file.

So I checked chromosome naming and found that chromosomes in the reference genome are named [chromosome 1] and in the GTF file they are named [chr1], so thought to change chromosomes naming in the reference genome using this code:

enter image description here

But when I check the document I found them unchanged:

enter image description here

Just wanted to know what I'm doing wrong.

biopython • 1.3k views
ADD COMMENT
2
Entering edit mode

When you modify a file's contents in memory, the file doesn't change. You'll need to write the in-memory contents to a new file for that file to contain the changes you make.

ADD REPLY
1
Entering edit mode

Save time and if possible download matching annotation files from whichever location you choose to get the sequence data from. Everything will match without having to mess with this sort of thing.

ADD REPLY
1
Entering edit mode

from the 2nd image, it looks like chromosome names are not in the form chromosome1, chromosome2 .. so on. they are in the form NC_***, so replacing chromosome1 to chr1 would not help as chromosome names from fasta file and GTF still be different. I would follow the GenoMax suggestion for the ease

ADD REPLY
1
Entering edit mode
2.1 years ago
Alban Nabla ▴ 30

If you are using the records on the fly during the for loop iteration, then you could simply change the assignment to:

for record in SeqIO.parse('yourfile.fsa', 'fasta'):
    record.description = record.description.replace('chromosome', 'chr')
    ## do something with the record here 

If you can't use these edited records on the fly, then you will need to save the changes to a new fasta file using Bio.SeqIO.Write(), as mentioned by Ram

ADD COMMENT

Login before adding your answer.

Traffic: 1621 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6