GTF upload error UCSC related to stradedness
1
0
Entering edit mode
3.1 years ago

Hello,

I'm having issues with uploading a .gtf file to the UCSC browser. I am getting the following error:

"Error GFF/GTF group STRG.155047.1 on chr12+, this line is on chr12-, all group members must be on same seq and strand"

I have previously uploaded .gtf files after converting from mm10 to mm9 using CrossMap.py without any issues. Does anyone have a suggestion for how to get around this error?

This time I first filtered the original .gtf to remove all single-exonic transcripts before converting to mm9 using:

./gffread file.gtf -U -T -o multiexonic.gtf

The resulting file (multiexonic.gtf) has the following structure:

chr1    StringTie   transcript  4807911 4841093 1000    +   .   transcript_id "STRG.187.7"; gene_id "STRG.187"
chr1    StringTie   exon    4807911 4808486 1000    +   .   transcript_id "STRG.187.7"; gene_id "STRG.187";
chr1    StringTie   exon    4828584 4828649 1000    +   .   transcript_id "STRG.187.7"; gene_id "STRG.187";
chr1    StringTie   exon    4830268 4830315 1000    +   .   transcript_id "STRG.187.7"; gene_id "STRG.187";
chr1    StringTie   exon    4832311 4832381 1000    +   .   transcript_id "STRG.187.7"; gene_id "STRG.187";
chr1    StringTie   exon    4837001 4837074 1000    +   .   transcript_id "STRG.187.7"; gene_id "STRG.187";
chr1    StringTie   exon    4840956 4841093 1000    +   .   transcript_id "STRG.187.7"; gene_id "STRG.187";
chr1    StringTie   transcript  4807911 4846739 1000    +   .   transcript_id "STRG.187.4"; gene_id "STRG.187"
chr1    StringTie   exon    4807911 4807982 1000    +   .   transcript_id "STRG.187.4"; gene_id "STRG.187";
chr1    StringTie   exon    4808455 4808486 1000    +   .   transcript_id "STRG.187.4"; gene_id "STRG.187";

Thanks in advance for any and all advice you can provide!

genome browser RNAseq gtf UCSC • 1.1k views
ADD COMMENT
0
Entering edit mode

Update: if I look at the file after the gffread editing and after CrossMap.py conversion, the issue seems to be arising during the conversion process. It seems like somehow the crossmap process is changing the strandedness of some of the exons. It also looks like it has lost some of the exons too. Has anyone encountered this problem before and/or have trouble shooting suggestions?

Before converting assemblies: chr12 StringTie transcript 18610020 18652089 1000 + . transcript_id "STRG.155047.1"; gene_id "STRG.155047" chr12 StringTie exon 18610020 18610510 1000 + . transcript_id "STRG.155047.1"; gene_id "STRG.155047"; chr12 StringTie exon 18646480 18646606 1000 + . transcript_id "STRG.155047.1"; gene_id "STRG.155047"; chr12 StringTie exon 18646807 18646867 1000 + . transcript_id "STRG.155047.1"; gene_id "STRG.155047"; chr12 StringTie exon 18648202 18652089 1000 + . transcript_id "STRG.155047.1"; gene_id "STRG.155047";

After converting assemblies: chr12 StringTie transcript 17888510 17911597 1000 + . transcript_id "STRG.155010.1"; gene_id "STRG.155010" chr12 StringTie exon 17888510 17889128 1000 + . transcript_id "STRG.155010.1"; gene_id "STRG.155010"; chr12 StringTie exon 17907466 17907592 1000 + . transcript_id "STRG.155010.1"; gene_id "STRG.155010"; chr12 StringTie exon 17907793 17907853 1000 + . transcript_id "STRG.155010.1"; gene_id "STRG.155010"; chr12 StringTie exon 17909195 17911597 1000 + . transcript_id "STRG.155010.1"; gene_id "STRG.155010"; chr12 StringTie exon 18616826 18617316 1000 + . transcript_id "STRG.155047.1"; gene_id "STRG.155047"; chr12 StringTie exon 24902419 24902545 1000 - . transcript_id "STRG.155047.1"; gene_id "STRG.155047"; chr12 StringTie exon 24902158 24902218 1000 - . transcript_id "STRG.155047.1"; gene_id "STRG.155047";

ADD REPLY
1
Entering edit mode
3.1 years ago

This makes a lot of sense, if you liftOver transcripts to another genome, a transcript can get split so you'll have two transcripts with the same ID. Crossmap does this when a piece of the genome was inverted. However, for GTF, each transcript must have a unique ID. You cannot have two different transcripts (they're on differents strands, so two different transcripts).

To solve this problem you can either remove one copy of STRG.155047.1 or use something like awk, perl or python to give each transcript a unique ID.

ADD COMMENT

Login before adding your answer.

Traffic: 2363 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6