Can I make GFF3 from either GFF or genome fasta file?
1
0
Entering edit mode
6.3 years ago
jaqx008 ▴ 110

Hello all, I am trying to create a densitymap for a TE.bed file I have. I am using chicken-repeats.inra.fr densitymap GUI ( I dont know the details of how it works). the required input file format is gff3 and I have a GFF that looks like this

Bf_V2_1     1   797 -   bf_rep_71           Unknown         Bf_V2_1  
    848 936 +   (TA)n               Simple_repeat   Bf_V2_1  
    1236    1369    -   CR1-11_BF           LINE/CR1        Bf_V2_1  
    2151    2171    +   (TA)n               Simple_repeat   Bf_V2_1  
    2351    3238    -   bf_rep_71           Unknown         Bf_V2_1  
    3229    3413    +   DNA-X-4_BF          DNA/Unknown     Bf_V2_1  
    3400    3506    +   Harbinger-N11_BF    DNA/Harbinger

Is there a way to convert this to GFF3? or can I make GFF3 from fasta? And is there another way to create the densitymap to show locations of my transposable elements in the genome? I have read suggestions to similar question and non has been very helpful. Thanks

GFFtoGFF3 DensityMap TEAnnotation • 1.9k views
ADD COMMENT
0
Entering edit mode

Is there a way to convert this to GFF3?

Are you asking if you can convert a .bed file to .gff file?

or can I make GFF3 from fasta?

No, it is not possible to convert/generate a GFF3 file from fasta file. GFF file usually stores annotation data whereas fasta file contains sequences.

ADD REPLY
0
Entering edit mode

by is there a way to convert this to GFF3 I mean convert my GFF to GFF3.

ADD REPLY
0
Entering edit mode
6.3 years ago
Beuss ▴ 140

I guess the format of example has been broken, so I suppose that your input file is like this:

Bf_V2_1 1   797 -   bf_rep_71   Unknown
Bf_V2_1 848 936 +   (TA)n   Simple_repeat
Bf_V2_1 1236    1369    -   CR1-11_BF   LINE/CR1
Bf_V2_1 2151    2171    +   (TA)n   Simple_repeat
Bf_V2_1 2351    3238    -   bf_rep_71   Unknown
Bf_V2_1 3229    3413    +   DNA-X-4_BF  DNA/Unknown
Bf_V2_1 3400    3506    +   Harbinger-N11_BF    DNA/Harbinger

If the positions on a base 1 like gff, use this perl one liner :

perl -nae 'print "$F[0]\tmySource\t$F[4]\t$F[1]\t$F[2]\.\t$F[3]\tRepeatFamily=$F[5]\n"' TE.bed

Output:

Bf_V2_1 mySource    bf_rep_71   1   797.    -   RepeatFamily=Unknown
Bf_V2_1 mySource    (TA)n   848 936.    +   RepeatFamily=Simple_repeat
Bf_V2_1 mySource    CR1-11_BF   1236    1369.   -   RepeatFamily=LINE/CR1
Bf_V2_1 mySource    (TA)n   2151    2171.   +   RepeatFamily=Simple_repeat
Bf_V2_1 mySource    bf_rep_71   2351    3238.   -   RepeatFamily=Unknown
Bf_V2_1 mySource    DNA-X-4_BF  3229    3413.   +   RepeatFamily=DNA/Unknown
Bf_V2_1 mySource    Harbinger-N11_BF    3400    3506.   +   RepeatFamily=DNA/Harbinger

If the positions on a base 0 like bed, use this perl one liner :

perl -nae 'print "$F[0]\tmySource\t$F[4]\t".($F[1] + 1)."\t$F[2]\.\t$F[3]\tRepeatFamily=$F[5]\n"' TE.bed

Output:

Bf_V2_1 mySource    bf_rep_71   2   797.    -   RepeatFamily=Unknown
Bf_V2_1 mySource    (TA)n   849 936.    +   RepeatFamily=Simple_repeat
Bf_V2_1 mySource    CR1-11_BF   1237    1369.   -   RepeatFamily=LINE/CR1
Bf_V2_1 mySource    (TA)n   2152    2171.   +   RepeatFamily=Simple_repeat
Bf_V2_1 mySource    bf_rep_71   2352    3238.   -   RepeatFamily=Unknown
Bf_V2_1 mySource    DNA-X-4_BF  3230    3413.   +   RepeatFamily=DNA/Unknown
Bf_V2_1 mySource    Harbinger-N11_BF    3401    3506.   +   RepeatFamily=DNA/Harbinger
ADD COMMENT
0
Entering edit mode

while I certainly like the perl one liner solution, there's a few problems with the gff3 like format. Col3 is better represented as something like a CDS, mRNA or similar higher order class and $F[4] better goes to the end and there must not be a period following the coordinate

perl -nae 'print "$F[0]\tmySource\tCDS\t$F[1]\t$F[2]\t$F[3]\tName=$F[4];RepeatFamily=$F[5]\n"' TE.bed

See gff specifications for reference https://github.com/The-Sequence-Ontology/Specifications/blob/master/gff3.md

ADD REPLY
0
Entering edit mode

Sorry for the late response. I am trying to this now, will post results soon. Thanks

ADD REPLY

Login before adding your answer.

Traffic: 1628 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6