Difference of transcripts between my GFF file and the IGV results
0
0
Entering edit mode
4.1 years ago
pablo ▴ 310

Hi,

I aligned my transcripts reads generated by the Isoseq tool, against my reference. Then, I generated a GFF file.

When I look at the number of transcripts in the GFF file and with IGV, there is a difference. For example, I focus on a specific scaffold (named Super-Scaffold_100047) of the reference.

GFF file :

cat out.gff | grep "Super-Scaffold_100047" | awk '{print $3}' | grep "transcript" | wc -l

21 transcripts

IGV :

I count 25 transcripts as you can see on the image.

IGV

I share the GFF file corresponding to the scaffold in question (only the beginning because of the size of the file ) :

Super-Scaffold_100047   PacBio  transcript      281156  287937  .       +       .       gene_id "PB.8691"; transcript_id "PB.8691.1";
Super-Scaffold_100047   PacBio  exon    281156  281855  .       +       .       gene_id "PB.8691"; transcript_id "PB.8691.1";
Super-Scaffold_100047   PacBio  exon    282017  282094  .       +       .       gene_id "PB.8691"; transcript_id "PB.8691.1";
Super-Scaffold_100047   PacBio  exon    282323  282446  .       +       .       gene_id "PB.8691"; transcript_id "PB.8691.1";
Super-Scaffold_100047   PacBio  exon    283108  283380  .       +       .       gene_id "PB.8691"; transcript_id "PB.8691.1";
Super-Scaffold_100047   PacBio  exon    284686  284805  .       +       .       gene_id "PB.8691"; transcript_id "PB.8691.1";
Super-Scaffold_100047   PacBio  exon    287372  287937  .       +       .       gene_id "PB.8691"; transcript_id "PB.8691.1";
Super-Scaffold_100047   PacBio  transcript      281168  287895  .       +       .       gene_id "PB.8691"; transcript_id "PB.8691.2";
Super-Scaffold_100047   PacBio  exon    281168  281855  .       +       .       gene_id "PB.8691"; transcript_id "PB.8691.2";
Super-Scaffold_100047   PacBio  exon    282017  282094  .       +       .       gene_id "PB.8691"; transcript_id "PB.8691.2";
Super-Scaffold_100047   PacBio  exon    282323  282446  .       +       .       gene_id "PB.8691"; transcript_id "PB.8691.2";
Super-Scaffold_100047   PacBio  exon    283108  283380  .       +       .       gene_id "PB.8691"; transcript_id "PB.8691.2";
Super-Scaffold_100047   PacBio  exon    284686  284805  .       +       .       gene_id "PB.8691"; transcript_id "PB.8691.2";
Super-Scaffold_100047   PacBio  exon    287372  287895  .       +       .       gene_id "PB.8691"; transcript_id "PB.8691.2";
Super-Scaffold_100047   PacBio  transcript      281217  288458  .       +       .       gene_id "PB.8691"; transcript_id "PB.8691.3";
Super-Scaffold_100047   PacBio  exon    281217  281855  .       +       .       gene_id "PB.8691"; transcript_id "PB.8691.3";
Super-Scaffold_100047   PacBio  exon    282017  282094  .       +       .       gene_id "PB.8691"; transcript_id "PB.8691.3";
Super-Scaffold_100047   PacBio  exon    282323  282446  .       +       .       gene_id "PB.8691"; transcript_id "PB.8691.3";
Super-Scaffold_100047   PacBio  exon    283108  283380  .       +       .       gene_id "PB.8691"; transcript_id "PB.8691.3";
Super-Scaffold_100047   PacBio  exon    284686  284805  .       +       .       gene_id "PB.8691"; transcript_id "PB.8691.3";
Super-Scaffold_100047   PacBio  exon    287372  288458  .       +       .       gene_id "PB.8691"; transcript_id "PB.8691.3";
Super-Scaffold_100047   PacBio  transcript      281287  288086  .       +       .       gene_id "PB.8691"; transcript_id "PB.8691.4";
Super-Scaffold_100047   PacBio  exon    281287  281855  .       +       .       gene_id "PB.8691"; transcript_id "PB.8691.4";
Super-Scaffold_100047   PacBio  exon    282017  282094  .       +       .       gene_id "PB.8691"; transcript_id "PB.8691.4";
Super-Scaffold_100047   PacBio  exon    282323  282446  .       +       .       gene_id "PB.8691"; transcript_id "PB.8691.4";
Super-Scaffold_100047   PacBio  exon    283108  283380  .       +       .       gene_id "PB.8691"; transcript_id "PB.8691.4";
Super-Scaffold_100047   PacBio  exon    284686  284805  .       +       .       gene_id "PB.8691"; transcript_id "PB.8691.4";
Super-Scaffold_100047   PacBio  exon    287372  288086  .       +       .       gene_id "PB.8691"; transcript_id "PB.8691.4";
Super-Scaffold_100047   PacBio  transcript      281544  288081  .       +       .       gene_id "PB.8691"; transcript_id "PB.8691.5";
Super-Scaffold_100047   PacBio  exon    281544  281855  .       +       .       gene_id "PB.8691"; transcript_id "PB.8691.5";
Super-Scaffold_100047   PacBio  exon    282017  282094  .       +       .       gene_id "PB.8691"; transcript_id "PB.8691.5";
Super-Scaffold_100047   PacBio  exon    282323  282446  .       +       .       gene_id "PB.8691"; transcript_id "PB.8691.5";
Super-Scaffold_100047   PacBio  exon    283108  283380  .       +       .       gene_id "PB.8691"; transcript_id "PB.8691.5";
Super-Scaffold_100047   PacBio  exon    284686  284805  .       +       .       gene_id "PB.8691"; transcript_id "PB.8691.5";
Super-Scaffold_100047   PacBio  exon    287372  288081  .       +       .       gene_id "PB.8691"; transcript_id "PB.8691.5";
Super-Scaffold_100047   PacBio  transcript      281590  286734  .       +       .       gene_id "PB.8691"; transcript_id "PB.8691.6";
Super-Scaffold_100047   PacBio  exon    281590  281855  .       +       .       gene_id "PB.8691"; transcript_id "PB.8691.6";
Super-Scaffold_100047   PacBio  exon    282017  282094  .       +       .       gene_id "PB.8691"; transcript_id "PB.8691.6";
Super-Scaffold_100047   PacBio  exon    282323  282446  .       +       .       gene_id "PB.8691"; transcript_id "PB.8691.6";
Super-Scaffold_100047   PacBio  exon    283108  283380  .       +       .       gene_id "PB.8691"; transcript_id "PB.8691.6";
Super-Scaffold_100047   PacBio  exon    284686  286734  .       +       .       gene_id "PB.8691"; transcript_id "PB.8691.6";
Super-Scaffold_100047   PacBio  transcript      220944  223787  .       -       .       gene_id "PB.8692"; transcript_id "PB.8692.1";
Super-Scaffold_100047   PacBio  exon    220944  223787  .       -       .       gene_id "PB.8692"; transcript_id "PB.8692.1";
Super-Scaffold_100047   PacBio  transcript      311770  318491  .       -       .       gene_id "PB.8693"; transcript_id "PB.8693.1";
Super-Scaffold_100047   PacBio  exon    311770  312629  .       -       .       gene_id "PB.8693"; transcript_id "PB.8693.1";
Super-Scaffold_100047   PacBio  exon    312770  312942  .       -       .       gene_id "PB.8693"; transcript_id "PB.8693.1";
Super-Scaffold_100047   PacBio  exon    314316  314634  .       -       .       gene_id "PB.8693"; transcript_id "PB.8693.1";
Super-Scaffold_100047   PacBio  exon    314763  315028  .       -       .       gene_id "PB.8693"; transcript_id "PB.8693.1";
Super-Scaffold_100047   PacBio  exon    315299  315530  .       -       .       gene_id "PB.8693"; transcript_id "PB.8693.1";
Super-Scaffold_100047   PacBio  exon    315681  315995  .       -       .       gene_id "PB.8693"; transcript_id "PB.8693.1";
Super-Scaffold_100047   PacBio  exon    316110  316180  .       -       .       gene_id "PB.8693"; transcript_id "PB.8693.1";
Super-Scaffold_100047   PacBio  exon    316280  316494  .       -       .       gene_id "PB.8693"; transcript_id "PB.8693.1";
Super-Scaffold_100047   PacBio  exon    316582  316945  .       -       .       gene_id "PB.8693"; transcript_id "PB.8693.1";
Super-Scaffold_100047   PacBio  exon    317141  317314  .       -       .       gene_id "PB.8693"; transcript_id "PB.8693.1";
Super-Scaffold_100047   PacBio  exon    317520  317738  .       -       .       gene_id "PB.8693"; transcript_id "PB.8693.1";
Super-Scaffold_100047   PacBio  exon    317816  318154  .       -       .       gene_id "PB.8693"; transcript_id "PB.8693.1";
Super-Scaffold_100047   PacBio  exon    318223  318491  .       -       .       gene_id "PB.8693"; transcript_id "PB.8693.1";
Super-Scaffold_100047   PacBio  transcript      311770  318501  .       -       .       gene_id "PB.8693"; transcript_id "PB.8693.2";
Super-Scaffold_100047   PacBio  exon    311770  312050  .       -       .       gene_id "PB.8693"; transcript_id "PB.8693.2";
Super-Scaffold_100047   PacBio  exon    312246  312629  .       -       .       gene_id "PB.8693"; transcript_id "PB.8693.2";
Super-Scaffold_100047   PacBio  exon    312770  312942  .       -       .       gene_id "PB.8693"; transcript_id "PB.8693.2";
Super-Scaffold_100047   PacBio  exon    314316  314634  .       -       .       gene_id "PB.8693"; transcript_id "PB.8693.2";
Super-Scaffold_100047   PacBio  exon    314763  315028  .       -       .       gene_id "PB.8693"; transcript_id "PB.8693.2";
Super-Scaffold_100047   PacBio  exon    315299  315530  .       -       .       gene_id "PB.8693"; transcript_id "PB.8693.2";
Super-Scaffold_100047   PacBio  exon    315681  315995  .       -       .       gene_id "PB.8693"; transcript_id "PB.8693.2";
Super-Scaffold_100047   PacBio  exon    316110  316180  .       -       .       gene_id "PB.8693"; transcript_id "PB.8693.2";
Super-Scaffold_100047   PacBio  exon    316280  316494  .       -       .       gene_id "PB.8693"; transcript_id "PB.8693.2";
Super-Scaffold_100047   PacBio  exon    316582  316945  .       -       .       gene_id "PB.8693"; transcript_id "PB.8693.2";
Super-Scaffold_100047   PacBio  exon    317141  317314  .       -       .       gene_id "PB.8693"; transcript_id "PB.8693.2";
Super-Scaffold_100047   PacBio  exon    317520  317738  .       -       .       gene_id "PB.8693"; transcript_id "PB.8693.2";
Super-Scaffold_100047   PacBio  exon    317816  318154  .       -       .       gene_id "PB.8693"; transcript_id "PB.8693.2";
Super-Scaffold_100047   PacBio  exon    318223  318501  .       -       .       gene_id "PB.8693"; transcript_id "PB.8693.2";
Super-Scaffold_100047   PacBio  transcript      311770  316826  .       -       .       gene_id "PB.8693"; transcript_id "PB.8693.3";
Super-Scaffold_100047   PacBio  exon    311770  312629  .       -       .       gene_id "PB.8693"; transcript_id "PB.8693.3";
Super-Scaffold_100047   PacBio  exon    312770  312942  .       -       .       gene_id "PB.8693"; transcript_id "PB.8693.3";
Super-Scaffold_100047   PacBio  exon    314316  314634  .       -       .       gene_id "PB.8693"; transcript_id "PB.8693.3";
Super-Scaffold_100047   PacBio  exon    314763  315028  .       -       .       gene_id "PB.8693"; transcript_id "PB.8693.3";
Super-Scaffold_100047   PacBio  exon    315299  315530  .       -       .       gene_id "PB.8693"; transcript_id "PB.8693.3";
Super-Scaffold_100047   PacBio  exon    315681  315995  .       -       .       gene_id "PB.8693"; transcript_id "PB.8693.3";
Super-Scaffold_100047   PacBio  exon    316110  316180  .       -       .       gene_id "PB.8693"; transcript_id "PB.8693.3";
Super-Scaffold_100047   PacBio  exon    316280  316494  .       -       .       gene_id "PB.8693"; transcript_id "PB.8693.3";
Super-Scaffold_100047   PacBio  exon    316582  316826  .       -       .       gene_id "PB.8693"; transcript_id "PB.8693.3";
Super-Scaffold_100047   PacBio  transcript      311771  317638  .       -       .       gene_id "PB.8693"; transcript_id "PB.8693.4";
Super-Scaffold_100047   PacBio  exon    311771  312629  .       -       .       gene_id "PB.8693"; transcript_id "PB.8693.4";
Super-Scaffold_100047   PacBio  exon    312770  312942  .       -       .       gene_id "PB.8693"; transcript_id "PB.8693.4";
Super-Scaffold_100047   PacBio  exon    314316  314634  .       -       .       gene_id "PB.8693"; transcript_id "PB.8693.4";
Super-Scaffold_100047   PacBio  exon    314763  315028  .       -       .       gene_id "PB.8693"; transcript_id "PB.8693.4";
Super-Scaffold_100047   PacBio  exon    315299  315530  .       -       .       gene_id "PB.8693"; transcript_id "PB.8693.4";
Super-Scaffold_100047   PacBio  exon    315681  315995  .       -       .       gene_id "PB.8693"; transcript_id "PB.8693.4";
Super-Scaffold_100047   PacBio  exon    316110  316180  .       -       .       gene_id "PB.8693"; transcript_id "PB.8693.4";
Super-Scaffold_100047   PacBio  exon    316280  316494  .       -       .       gene_id "PB.8693"; transcript_id "PB.8693.4";
Super-Scaffold_100047   PacBio  exon    316582  316945  .       -       .       gene_id "PB.8693"; transcript_id "PB.8693.4";
Super-Scaffold_100047   PacBio  exon    317141  317314  .       -       .       gene_id "PB.8693"; transcript_id "PB.8693.4";
Super-Scaffold_100047   PacBio  exon    317520  317638  .       -       .       gene_id "PB.8693"; transcript_id "PB.8693.4";
Super-Scaffold_100047   PacBio  transcript      311771  316452  .       -       .       gene_id "PB.8693"; transcript_id "PB.8693.5";
Super-Scaffold_100047   PacBio  exon    311771  312629  .       -       .       gene_id "PB.8693"; transcript_id "PB.8693.5";
Super-Scaffold_100047   PacBio  exon    312770  312942  .       -       .       gene_id "PB.8693"; transcript_id "PB.8693.5";
Super-Scaffold_100047   PacBio  exon    314316  314634  .       -       .       gene_id "PB.8693"; transcript_id "PB.8693.5";
Super-Scaffold_100047   PacBio  exon    314763  315028  .       -       .       gene_id "PB.8693"; transcript_id "PB.8693.5";
Super-Scaffold_100047   PacBio  exon    315299  315530  .       -       .       gene_id "PB.8693"; transcript_id "PB.8693.5";
Super-Scaffold_100047   PacBio  exon    315681  315995  .       -       .       gene_id "PB.8693"; transcript_id "PB.8693.5";
Super-Scaffold_100047   PacBio  exon    316110  316180  .       -       .       gene_id "PB.8693"; transcript_id "PB.8693.5";
Super-Scaffold_100047   PacBio  exon    316280  316452  .       -       .       gene_id "PB.8693"; transcript_id "PB.8693.5";

Do you have an explanation?

Best

gff igv transcripts • 1.4k views
ADD COMMENT
0
Entering edit mode

These these are giant sequences perhaps there is a secondary alignment with one or two of reads=transcripts?

ADD REPLY
0
Entering edit mode

Actually, these are full length transcripts obtained from PacBio sequencing. That's why they are giant.

ADD REPLY
0
Entering edit mode

Is there a secondary alignment with one or two of them? Leading to two extra alignments you see.

ADD REPLY
0
Entering edit mode

There's not. I check each transcript : they are all unique and align only once on the genome.

Also, when I count the number of transcripts specific to that scaffold in my alignment.bam file, I got 25 sequences.

Do you know if there is a way to count the number of alignments with IGV? I could verify with other scaffols but there are much more alignments , so boring to count..

ADD REPLY
0
Entering edit mode

IGV is only a viewer. You can count number of alignments easily with samtools idxstats.

ADD REPLY
0
Entering edit mode

Hi,

Could you provide the mentioned GFF file?

ADD REPLY
1
Entering edit mode

I updated my post as you can see with the mentioned GFF file.

ADD REPLY
0
Entering edit mode

Maybe identical isoforms that are collapsed by IGV. Could you run your gff file through AGAT to check if there is any identical isoform?:

agat_convert_sp_gxf2gxf.pl --gff input.gff -o output.gff
ADD REPLY
0
Entering edit mode

I will install that tools. Actually, when I run samtools inxstats my_alignment.bam , I find the right number of reads/isoforms for the scaffold in question Super-Scaffold_100047 379068 25 0 . Then, the problem comes from the GFF file which is bad created.

ADD REPLY

Login before adding your answer.

Traffic: 1820 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6