If I use gffread for genomic features labelled as existing on the negative strand, will gffread find the reverse complement automatically when extracting the sequence from the fasta file, or will I have to implement an additional step to get that?
Hello!
Any ideas why I'm getting a seg fault when I run this?
gffread BF_annotation.gtf -W -O -E -L -F -w BF_transcripts.fa -g BF_genome.fa
GFF Warning: merging overlapping/adjacent feature segment exon (8436614-8437382) with exon (8436535-8436613) for GFF ID KDM4A on scaffold_1
GFF Warning: merging overlapping/adjacent feature segment exon (319524-319638) with exon (319006-319523) for GFF ID ZFYVE28 on scaffold_199
GFF Warning: merging overlapping/adjacent feature segment exon (1352880-1353047) with exon (1351726-1352879) for GFF ID FLAD1 on scaffold_218
Segmentation fault (core dumped)
and when I run without certain arguments:
gffread -w BF_transcripts.fa -W -O -E -g ./BF_genome.fa BF_annotation.gtf
it outputs warnings:
GFF Warning: merging overlapping/adjacent feature segment exon (8436614-8437382) with exon (8436535-8436613) for GFF ID KDM4A on scaffold_1
GFF Warning: merging overlapping/adjacent feature segment exon (319524-319638) with exon (319006-319523) for GFF ID ZFYVE28 on scaffold_199
GFF Warning: merging overlapping/adjacent feature segment exon (1352880-1353047) with exon (1351726-1352879) for GFF ID FLAD1 on scaffold_218
Error (GFaSeqGet): subsequence cannot be larger than 1560
Error getting subseq for BLEC (1..1561)!
You can see that there is some overlap in the middle. This is the situation that you have on 3 of your scaffolds (1, 199, 218).
This is not necessarily a problem. Even in the human transcriptome CDS FASTA file from GENCODE, many genes have the same sequences because they also overlap. It may be a problem depending on what are your next analysis steps.
Hello! Any ideas why I'm getting a seg fault when I run this?
and when I run without certain arguments: gffread -w BF_transcripts.fa -W -O -E -g ./BF_genome.fa BF_annotation.gtf
it outputs warnings:
I hadn't seen this post before and made a related post: C: Get sequences for all genes in a GTF file
Thanks!
I cleaned the .gtf from some genes (4 of them) and it run without warnings, but I'm not clear this is how I should have proceeded.
In the warning messages, it is just saying that some of your gene exon co-ordinates overlap. For example:
You can see that there is some overlap in the middle. This is the situation that you have on 3 of your scaffolds (1, 199, 218).
This is not necessarily a problem. Even in the human transcriptome CDS FASTA file from GENCODE, many genes have the same sequences because they also overlap. It may be a problem depending on what are your next analysis steps.