Entering edit mode
4.1 years ago
pablo
▴
310
Hi,
I aligned my transcripts reads generated by the Isoseq tool, against my reference. Then, I generated a GFF file.
When I look at the number of transcripts in the GFF file and with IGV, there is a difference. For example, I focus on a specific scaffold (named Super-Scaffold_100047) of the reference.
GFF file :
cat out.gff | grep "Super-Scaffold_100047" | awk '{print $3}' | grep "transcript" | wc -l
21 transcripts
IGV :
I count 25 transcripts as you can see on the image.
I share the GFF file corresponding to the scaffold in question (only the beginning because of the size of the file ) :
Super-Scaffold_100047 PacBio transcript 281156 287937 . + . gene_id "PB.8691"; transcript_id "PB.8691.1";
Super-Scaffold_100047 PacBio exon 281156 281855 . + . gene_id "PB.8691"; transcript_id "PB.8691.1";
Super-Scaffold_100047 PacBio exon 282017 282094 . + . gene_id "PB.8691"; transcript_id "PB.8691.1";
Super-Scaffold_100047 PacBio exon 282323 282446 . + . gene_id "PB.8691"; transcript_id "PB.8691.1";
Super-Scaffold_100047 PacBio exon 283108 283380 . + . gene_id "PB.8691"; transcript_id "PB.8691.1";
Super-Scaffold_100047 PacBio exon 284686 284805 . + . gene_id "PB.8691"; transcript_id "PB.8691.1";
Super-Scaffold_100047 PacBio exon 287372 287937 . + . gene_id "PB.8691"; transcript_id "PB.8691.1";
Super-Scaffold_100047 PacBio transcript 281168 287895 . + . gene_id "PB.8691"; transcript_id "PB.8691.2";
Super-Scaffold_100047 PacBio exon 281168 281855 . + . gene_id "PB.8691"; transcript_id "PB.8691.2";
Super-Scaffold_100047 PacBio exon 282017 282094 . + . gene_id "PB.8691"; transcript_id "PB.8691.2";
Super-Scaffold_100047 PacBio exon 282323 282446 . + . gene_id "PB.8691"; transcript_id "PB.8691.2";
Super-Scaffold_100047 PacBio exon 283108 283380 . + . gene_id "PB.8691"; transcript_id "PB.8691.2";
Super-Scaffold_100047 PacBio exon 284686 284805 . + . gene_id "PB.8691"; transcript_id "PB.8691.2";
Super-Scaffold_100047 PacBio exon 287372 287895 . + . gene_id "PB.8691"; transcript_id "PB.8691.2";
Super-Scaffold_100047 PacBio transcript 281217 288458 . + . gene_id "PB.8691"; transcript_id "PB.8691.3";
Super-Scaffold_100047 PacBio exon 281217 281855 . + . gene_id "PB.8691"; transcript_id "PB.8691.3";
Super-Scaffold_100047 PacBio exon 282017 282094 . + . gene_id "PB.8691"; transcript_id "PB.8691.3";
Super-Scaffold_100047 PacBio exon 282323 282446 . + . gene_id "PB.8691"; transcript_id "PB.8691.3";
Super-Scaffold_100047 PacBio exon 283108 283380 . + . gene_id "PB.8691"; transcript_id "PB.8691.3";
Super-Scaffold_100047 PacBio exon 284686 284805 . + . gene_id "PB.8691"; transcript_id "PB.8691.3";
Super-Scaffold_100047 PacBio exon 287372 288458 . + . gene_id "PB.8691"; transcript_id "PB.8691.3";
Super-Scaffold_100047 PacBio transcript 281287 288086 . + . gene_id "PB.8691"; transcript_id "PB.8691.4";
Super-Scaffold_100047 PacBio exon 281287 281855 . + . gene_id "PB.8691"; transcript_id "PB.8691.4";
Super-Scaffold_100047 PacBio exon 282017 282094 . + . gene_id "PB.8691"; transcript_id "PB.8691.4";
Super-Scaffold_100047 PacBio exon 282323 282446 . + . gene_id "PB.8691"; transcript_id "PB.8691.4";
Super-Scaffold_100047 PacBio exon 283108 283380 . + . gene_id "PB.8691"; transcript_id "PB.8691.4";
Super-Scaffold_100047 PacBio exon 284686 284805 . + . gene_id "PB.8691"; transcript_id "PB.8691.4";
Super-Scaffold_100047 PacBio exon 287372 288086 . + . gene_id "PB.8691"; transcript_id "PB.8691.4";
Super-Scaffold_100047 PacBio transcript 281544 288081 . + . gene_id "PB.8691"; transcript_id "PB.8691.5";
Super-Scaffold_100047 PacBio exon 281544 281855 . + . gene_id "PB.8691"; transcript_id "PB.8691.5";
Super-Scaffold_100047 PacBio exon 282017 282094 . + . gene_id "PB.8691"; transcript_id "PB.8691.5";
Super-Scaffold_100047 PacBio exon 282323 282446 . + . gene_id "PB.8691"; transcript_id "PB.8691.5";
Super-Scaffold_100047 PacBio exon 283108 283380 . + . gene_id "PB.8691"; transcript_id "PB.8691.5";
Super-Scaffold_100047 PacBio exon 284686 284805 . + . gene_id "PB.8691"; transcript_id "PB.8691.5";
Super-Scaffold_100047 PacBio exon 287372 288081 . + . gene_id "PB.8691"; transcript_id "PB.8691.5";
Super-Scaffold_100047 PacBio transcript 281590 286734 . + . gene_id "PB.8691"; transcript_id "PB.8691.6";
Super-Scaffold_100047 PacBio exon 281590 281855 . + . gene_id "PB.8691"; transcript_id "PB.8691.6";
Super-Scaffold_100047 PacBio exon 282017 282094 . + . gene_id "PB.8691"; transcript_id "PB.8691.6";
Super-Scaffold_100047 PacBio exon 282323 282446 . + . gene_id "PB.8691"; transcript_id "PB.8691.6";
Super-Scaffold_100047 PacBio exon 283108 283380 . + . gene_id "PB.8691"; transcript_id "PB.8691.6";
Super-Scaffold_100047 PacBio exon 284686 286734 . + . gene_id "PB.8691"; transcript_id "PB.8691.6";
Super-Scaffold_100047 PacBio transcript 220944 223787 . - . gene_id "PB.8692"; transcript_id "PB.8692.1";
Super-Scaffold_100047 PacBio exon 220944 223787 . - . gene_id "PB.8692"; transcript_id "PB.8692.1";
Super-Scaffold_100047 PacBio transcript 311770 318491 . - . gene_id "PB.8693"; transcript_id "PB.8693.1";
Super-Scaffold_100047 PacBio exon 311770 312629 . - . gene_id "PB.8693"; transcript_id "PB.8693.1";
Super-Scaffold_100047 PacBio exon 312770 312942 . - . gene_id "PB.8693"; transcript_id "PB.8693.1";
Super-Scaffold_100047 PacBio exon 314316 314634 . - . gene_id "PB.8693"; transcript_id "PB.8693.1";
Super-Scaffold_100047 PacBio exon 314763 315028 . - . gene_id "PB.8693"; transcript_id "PB.8693.1";
Super-Scaffold_100047 PacBio exon 315299 315530 . - . gene_id "PB.8693"; transcript_id "PB.8693.1";
Super-Scaffold_100047 PacBio exon 315681 315995 . - . gene_id "PB.8693"; transcript_id "PB.8693.1";
Super-Scaffold_100047 PacBio exon 316110 316180 . - . gene_id "PB.8693"; transcript_id "PB.8693.1";
Super-Scaffold_100047 PacBio exon 316280 316494 . - . gene_id "PB.8693"; transcript_id "PB.8693.1";
Super-Scaffold_100047 PacBio exon 316582 316945 . - . gene_id "PB.8693"; transcript_id "PB.8693.1";
Super-Scaffold_100047 PacBio exon 317141 317314 . - . gene_id "PB.8693"; transcript_id "PB.8693.1";
Super-Scaffold_100047 PacBio exon 317520 317738 . - . gene_id "PB.8693"; transcript_id "PB.8693.1";
Super-Scaffold_100047 PacBio exon 317816 318154 . - . gene_id "PB.8693"; transcript_id "PB.8693.1";
Super-Scaffold_100047 PacBio exon 318223 318491 . - . gene_id "PB.8693"; transcript_id "PB.8693.1";
Super-Scaffold_100047 PacBio transcript 311770 318501 . - . gene_id "PB.8693"; transcript_id "PB.8693.2";
Super-Scaffold_100047 PacBio exon 311770 312050 . - . gene_id "PB.8693"; transcript_id "PB.8693.2";
Super-Scaffold_100047 PacBio exon 312246 312629 . - . gene_id "PB.8693"; transcript_id "PB.8693.2";
Super-Scaffold_100047 PacBio exon 312770 312942 . - . gene_id "PB.8693"; transcript_id "PB.8693.2";
Super-Scaffold_100047 PacBio exon 314316 314634 . - . gene_id "PB.8693"; transcript_id "PB.8693.2";
Super-Scaffold_100047 PacBio exon 314763 315028 . - . gene_id "PB.8693"; transcript_id "PB.8693.2";
Super-Scaffold_100047 PacBio exon 315299 315530 . - . gene_id "PB.8693"; transcript_id "PB.8693.2";
Super-Scaffold_100047 PacBio exon 315681 315995 . - . gene_id "PB.8693"; transcript_id "PB.8693.2";
Super-Scaffold_100047 PacBio exon 316110 316180 . - . gene_id "PB.8693"; transcript_id "PB.8693.2";
Super-Scaffold_100047 PacBio exon 316280 316494 . - . gene_id "PB.8693"; transcript_id "PB.8693.2";
Super-Scaffold_100047 PacBio exon 316582 316945 . - . gene_id "PB.8693"; transcript_id "PB.8693.2";
Super-Scaffold_100047 PacBio exon 317141 317314 . - . gene_id "PB.8693"; transcript_id "PB.8693.2";
Super-Scaffold_100047 PacBio exon 317520 317738 . - . gene_id "PB.8693"; transcript_id "PB.8693.2";
Super-Scaffold_100047 PacBio exon 317816 318154 . - . gene_id "PB.8693"; transcript_id "PB.8693.2";
Super-Scaffold_100047 PacBio exon 318223 318501 . - . gene_id "PB.8693"; transcript_id "PB.8693.2";
Super-Scaffold_100047 PacBio transcript 311770 316826 . - . gene_id "PB.8693"; transcript_id "PB.8693.3";
Super-Scaffold_100047 PacBio exon 311770 312629 . - . gene_id "PB.8693"; transcript_id "PB.8693.3";
Super-Scaffold_100047 PacBio exon 312770 312942 . - . gene_id "PB.8693"; transcript_id "PB.8693.3";
Super-Scaffold_100047 PacBio exon 314316 314634 . - . gene_id "PB.8693"; transcript_id "PB.8693.3";
Super-Scaffold_100047 PacBio exon 314763 315028 . - . gene_id "PB.8693"; transcript_id "PB.8693.3";
Super-Scaffold_100047 PacBio exon 315299 315530 . - . gene_id "PB.8693"; transcript_id "PB.8693.3";
Super-Scaffold_100047 PacBio exon 315681 315995 . - . gene_id "PB.8693"; transcript_id "PB.8693.3";
Super-Scaffold_100047 PacBio exon 316110 316180 . - . gene_id "PB.8693"; transcript_id "PB.8693.3";
Super-Scaffold_100047 PacBio exon 316280 316494 . - . gene_id "PB.8693"; transcript_id "PB.8693.3";
Super-Scaffold_100047 PacBio exon 316582 316826 . - . gene_id "PB.8693"; transcript_id "PB.8693.3";
Super-Scaffold_100047 PacBio transcript 311771 317638 . - . gene_id "PB.8693"; transcript_id "PB.8693.4";
Super-Scaffold_100047 PacBio exon 311771 312629 . - . gene_id "PB.8693"; transcript_id "PB.8693.4";
Super-Scaffold_100047 PacBio exon 312770 312942 . - . gene_id "PB.8693"; transcript_id "PB.8693.4";
Super-Scaffold_100047 PacBio exon 314316 314634 . - . gene_id "PB.8693"; transcript_id "PB.8693.4";
Super-Scaffold_100047 PacBio exon 314763 315028 . - . gene_id "PB.8693"; transcript_id "PB.8693.4";
Super-Scaffold_100047 PacBio exon 315299 315530 . - . gene_id "PB.8693"; transcript_id "PB.8693.4";
Super-Scaffold_100047 PacBio exon 315681 315995 . - . gene_id "PB.8693"; transcript_id "PB.8693.4";
Super-Scaffold_100047 PacBio exon 316110 316180 . - . gene_id "PB.8693"; transcript_id "PB.8693.4";
Super-Scaffold_100047 PacBio exon 316280 316494 . - . gene_id "PB.8693"; transcript_id "PB.8693.4";
Super-Scaffold_100047 PacBio exon 316582 316945 . - . gene_id "PB.8693"; transcript_id "PB.8693.4";
Super-Scaffold_100047 PacBio exon 317141 317314 . - . gene_id "PB.8693"; transcript_id "PB.8693.4";
Super-Scaffold_100047 PacBio exon 317520 317638 . - . gene_id "PB.8693"; transcript_id "PB.8693.4";
Super-Scaffold_100047 PacBio transcript 311771 316452 . - . gene_id "PB.8693"; transcript_id "PB.8693.5";
Super-Scaffold_100047 PacBio exon 311771 312629 . - . gene_id "PB.8693"; transcript_id "PB.8693.5";
Super-Scaffold_100047 PacBio exon 312770 312942 . - . gene_id "PB.8693"; transcript_id "PB.8693.5";
Super-Scaffold_100047 PacBio exon 314316 314634 . - . gene_id "PB.8693"; transcript_id "PB.8693.5";
Super-Scaffold_100047 PacBio exon 314763 315028 . - . gene_id "PB.8693"; transcript_id "PB.8693.5";
Super-Scaffold_100047 PacBio exon 315299 315530 . - . gene_id "PB.8693"; transcript_id "PB.8693.5";
Super-Scaffold_100047 PacBio exon 315681 315995 . - . gene_id "PB.8693"; transcript_id "PB.8693.5";
Super-Scaffold_100047 PacBio exon 316110 316180 . - . gene_id "PB.8693"; transcript_id "PB.8693.5";
Super-Scaffold_100047 PacBio exon 316280 316452 . - . gene_id "PB.8693"; transcript_id "PB.8693.5";
Do you have an explanation?
Best
These these are giant sequences perhaps there is a secondary alignment with one or two of reads=transcripts?
Actually, these are full length transcripts obtained from PacBio sequencing. That's why they are giant.
Is there a secondary alignment with one or two of them? Leading to two extra alignments you see.
There's not. I check each transcript : they are all unique and align only once on the genome.
Also, when I count the number of transcripts specific to that scaffold in my alignment.bam file, I got 25 sequences.
Do you know if there is a way to count the number of alignments with IGV? I could verify with other scaffols but there are much more alignments , so boring to count..
IGV is only a viewer. You can count number of alignments easily with
samtools idxstats
.Hi,
Could you provide the mentioned GFF file?
I updated my post as you can see with the mentioned GFF file.
Maybe identical isoforms that are collapsed by IGV. Could you run your gff file through AGAT to check if there is any identical isoform?:
I will install that tools. Actually, when I run
samtools inxstats my_alignment.bam
, I find the right number of reads/isoforms for the scaffold in questionSuper-Scaffold_100047 379068 25 0
. Then, the problem comes from the GFF file which is bad created.