I have a GTF and here I'm showing an example:
GL000008.2 Cufflinks exon 83383 83545 . + . transcript_id "SHARED_00000001"; gene_id "XLOC_000001"; gene_name "XLOC_000001"; exon_number "1";
GL000008.2 Cufflinks transcript 83383 85626 . + . transcript_id "SHARED_00000001"; gene_id "XLOC_000001"; gene_name "XLOC_000001"; oId "SHARED_00000001"; class_code "u"; tss_id "TSS1";
GL000008.2 Cufflinks exon 85567 85626 . + . transcript_id "SHARED_00000001"; gene_id "XLOC_000001"; gene_name "XLOC_000001"; exon_number "2";
chr1 HAVANA exon 11869 12227 . + . transcript_id "SHARED_00000341"; gene_id "ENSG00000223972.5"; gene_name "ENSG00000223972.5"; exon_number "1";
chr1 HAVANA transcript 11869 14409 . + . transcript_id "SHARED_00000341"; gene_id "ENSG00000223972.5"; gene_name "ENSG00000223972.5"; oId "ENST00000456328.2"; tss_id "TSS213";
chr1 HAVANA exon 12613 12721 . + . transcript_id "SHARED_00000341"; gene_id "ENSG00000223972.5"; gene_name "ENSG00000223972.5"; exon_number "2";
chr1 HAVANA exon 13221 14409 . + . transcript_id "SHARED_00000341"; gene_id "ENSG00000223972.5"; gene_name "ENSG00000223972.5"; exon_number "3";
chr10_GL383545v1_alt ncbiRefSeq exon 3012 3170 . + . transcript_id "SHARED_00065395"; gene_id "XLOC_011047"; gene_name "XLOC_011047"; exon_number "1";
chr10_GL383545v1_alt ncbiRefSeq transcript 3012 96701 . + . transcript_id "SHARED_00065395"; gene_id "XLOC_011047"; gene_name "XLOC_011047";
chr10_GL383546v1_alt ncbiRefSeq transcript 295416 305254 . - . transcript_id "SHARED_00065412"; gene_id "XLOC_011055"; gene_name "XLOC_011055";
chr10_GL383546v1_alt ncbiRefSeq exon 299951 300098 . - . transcript_id "SHARED_00065412"; gene_id "XLOC_011055"; gene_name "XLOC_011055"; exon_number "2";
From the GTF, I would like to remove uncharacterized chromosomes like chr10_GL383545v1_alt
, chr10_GL383546v1_alt
and there are several others present in the original gtf
.
I would like to keep chr1-chr22, chrX, chrY, chrM
, and also contigs
like GL000008.2, KI270364.1, KI270740.1, and several other contigs.
Thanks a lot for the reply. I can still see some unwanted chromosomes like
chrUn_GL000195v1
,chrX_KI270880v1_alt
,chrY_KN196487v1_fix
I added a step about how to explore the result, and modify the filter. Hopefully it makes sense, and is useful to you. There are a variety of ways to solve this problem. This is just one.