GFF Error: overlapping duplicate dispersed_repeat feature in stringtie
0
0
Entering edit mode
11 months ago
Aki ▴ 20

Hi. I got following error when I use stringtie. with repeatmasker annotation gff file and RNA-seq bam files which is already sorted with samtools.

GFF Error: overlapping duplicate dispersed_repeat feature (ID=461)
GFF Error: overlapping duplicate dispersed_repeat feature (ID=712)
GFF Error: overlapping duplicate dispersed_repeat feature (ID=1013)
...
GFF Error: overlapping duplicate dispersed_repeat feature (ID=128998)

I generated the repeatmasker annotation from following file in the link https://hgdownload.soe.ucsc.edu/goldenPath/hs1/bigZips/hs1.repeatMasker.out.gz and convert gff file with rmOutToGFF3.pl. When I checked the duplicates in the original hs1.repeatMasker.out, there is many duplicates in the ID column top right like below (461).

  SW  perc perc perc  query      position in query           matching       repeat              position in  repeat
 score  div. del. ins.  sequence    begin     end    (left)    repeat         class/family         begin  end (left)   ID
  321  23.6  6.2  0.0  chrX       306885  306990 (153952576) C  L1M4c          LINE/L1             (4017) 2367   2255    459
  713  10.8  0.0  0.0  chrX       307028  307129 (153952437) C  AluJo          SINE/Alu              (91)  221    120    460
 1486  18.5  5.6  4.3  chrX       307210  307577 (153951989) C  MLT1C2         LTR/ERVL-MaLR         (47)  414     42    461
 1610  21.0  4.8  3.2  chrX       307562  307970 (153951596) C  MLT1C2         LTR/ERVL-MaLR         (40)  421      6    461
 1171  22.5  5.0  3.0  chrX       307986  308315 (153951251) C  MLT1C2         LTR/ERVL-MaLR        (124)  337      1    462

I am learning the analysis for transposable element from this article (https://www.nature.com/articles/s41588-019-0373-3), What do you think how the author deal with this problem? Could you tell me how should I deal with this? Thanks in advance.

stringtie UCSC repeatmasker • 550 views
ADD COMMENT
0
Entering edit mode

Well ... here is their code availability section:

https://www.nature.com/articles/s41588-019-0373-3#code-availability

All custom scripts are available from the authors upon request.

I would ask the authors for those scripts. Most likely they applied some combination of interval filtering with existing tools.

I will say it is pretty darn ridiculous how in this day and age you have to request that the author gives you the scripts. I am going to test this out and ask the authors for the script.

ADD REPLY

Login before adding your answer.

Traffic: 2431 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6