Hi
I need to get the annotation for a list of IDs (Just one column). It looks like this:
GeneID
Tt.00g000720
Tt.00g000730
Tt.00g000780
Tt.00g000850
Tt.00g000950
Tt.00g001260
Tt.00g001550
I have a gff for those genes IDs like this:
contig_101 GenSAS_65ea36c7c3784-publish gene 319 1590 . + . ID=Tt.00g000010;Name=Tt.00g000010;original_ID=75803-Tt.00g000010;Alias=75803-Tt.00g000010;original_name=75803-Tt.00g000010;Notes=XP_028877667.1 [BLAST protein vs protein (blastp) 2.12.0],XP_009314224.1 [DIAMOND Functional 2.0.11],L-fucokinase (IPR012887) [InterProScan 5.53-87.0],PF07959.7 [Pfam 1.6]
contig_101 GenSAS_65ea36c7c3784-publish mRNA 319 1590 . + . ID=Tt.00g000010.m01;Name=Tt.00g000010.m01;Parent=Tt.00g000010;original_ID=75803-Tt.00g000010.m01;Alias=75803-Tt.00g000010.m01;original_name=75803-Tt.00g000010
contig_101 GenSAS_65ea36c7c3784-publish exon 319 1590 . + . ID=Tt.00g000010.m01.exon01;Name=Tt.00g000010.m01.exon01;Parent=Tt.00g000010.m01;original_ID=75803-Tt.00g000010.m01.exon1;Alias=75803-Tt.00g000010.m01.exon1
contig_101 GenSAS_65ea36c7c3784-publish CDS 319 1590 . + 0 ID=Tt.00g000010.m01.CDS01;Name=Tt.00g000010.m01.CDS01;Parent=Tt.00g000010.m01;original_ID=cds.75803-Tt.00g000010.m01;Alias=cds.75803-Tt.00g000010.m01
contig_101 GenSAS_65ea36c7c3784-publish gene 1726 3468 . + . ID=Tt.00g000020;Name=Tt.00g000020;original_ID=75803-Tt.00g000020;Alias=75803-Tt.00g000020;original_name=75803-Tt.00g000020;Notes=XP_028877667.1 [BLAST protein vs protein (blastp) 2.12.0],XP_028877667.1 [DIAMOND Functional 2.0.11],Galactokinase/homoserine kinase (IPR001174) [InterProScan 5.53-87.0],PF00288.21 [Pfam 1.6]
contig_101 GenSAS_65ea36c7c3784-publish mRNA 1726 3468 . + . ID=Tt.00g000020.m01;Name=Tt.00g000020.m01;Parent=Tt.00g000020;original_ID=75803-Tt.00g000020.m01;Alias=75803-Tt.00g000020.m01;original_name=75803-Tt.00g000020
contig_101 GenSAS_65ea36c7c3784-publish exon 1726 3468 . + . ID=Tt.00g000020.m01.exon01;Name=Tt.00g000020.m01.exon01;Parent=Tt.00g000020.m01;original_ID=75803-Tt.00g000020.m01.exon1;Alias=75803-Tt.00g000020.m01.exon1
contig_101 GenSAS_65ea36c7c3784-publish CDS 1726 3468 . + 0 ID=Tt.00g000020.m01.CDS01;Name=Tt.00g000020.m01.CDS01;Parent=Tt.00g000020.m01;original_ID=cds.75803-Tt.00g000020.m01;Alias=cds.75803-Tt.00g000020.m01
contig_101 GenSAS_65ea36c7c3784-publish gene 4054 4560 . + . ID=Tt.00g000030;Name=Tt.00g000030;original_ID=75803-Tt.00g000030;Alias=75803-Tt.00g000030;original_name=75803-Tt.00g000030;Notes=XP_803186.1 [BLAST protein vs protein (blastp) 2.12.0],XP_803186.1 [DIAMOND Functional 2.0.11]
contig_101 GenSAS_65ea36c7c3784-publish mRNA 4054 4560 . + . ID=Tt.00g000030.m01;Name=Tt.00g000030.m01;Parent=Tt.00g000030;original_ID=75803-Tt.00g000030.m01;Alias=75803-Tt.00g000030.m01;original_name=75803-Tt.00g000030
contig_101 GenSAS_65ea36c7c3784-publish exon 4054 4560 . + . ID=Tt.00g000030.m01.exon01;Name=Tt.00g000030.m01.exon01;Parent=Tt.00g000030.m01;original_ID=75803-Tt.00g000030.m01.exon1;Alias=75803-Tt.00g000030.m01.exon1
contig_101 GenSAS_65ea36c7c3784-publish CDS 4054 4560 . + 0 ID=Tt.00g000030.m01.CDS01;Name=Tt.00g000030.m01.CDS01;Parent=Tt.00g000030.m01;original_ID=cds.75803-Tt.00g000030.m01;Alias=cds.75803-Tt.00g000030.m01
contig_101 GenSAS_65ea36c7c3784-publish gene 5050 6858 . + . ID=Tt.00g000040;Name=Tt.00g000040;original_ID=75803-Tt.00g000040;Alias=75803-Tt.00g000040;original_name=75803-Tt.00g000040;Notes=XP_803185.1 [BLAST protein vs protein (blastp) 2.12.0],XP_807877.1 [DIAMOND Functional 2.0.11],Pyridoxal phosphate-dependent decarboxylase (IPR002129) [InterProScan 5.53-87.0],PF00282.14 [Pfam 1.6]
contig_101 GenSAS_65ea36c7c3784-publish mRNA 5050 6858 . + . ID=Tt.00g000040.m01;Name=Tt.00g000040.m01;Parent=Tt.00g000040;original_ID=75803-Tt.00g000040.m01;Alias=75803-Tt.00g000040.m01;original_name=75803-Tt.00g000040
contig_101 GenSAS_65ea36c7c3784-publish exon 5050 6858 . + . ID=Tt.00g000040.m01.exon01;Name=Tt.00g000040.m01.exon01;Parent=Tt.00g000040.m01;original_ID=75803-Tt.00g000040.m01.exon1;Alias=75803-Tt.00g000040.m01.exon1
contig_101 GenSAS_65ea36c7c3784-publish CDS 5050 6858 . + 0 ID=Tt.00g000040.m01.CDS01;Name=Tt.00g000040.m01.CDS01;Parent=Tt.00g000040.m01;original_ID=cds.75803-Tt.00g000040.m01;Alias=cds.75803-Tt.00g000040.m01
How can I intersect the IDs with the GFF file to get the annotation for those IDs?. I have tried it using bedtools intersect but it didn't work because I don't have the annotation for those IDs.
Thank you very much for you help!
Hey
Thanks for your answer. I tried it but I got an empty output. Could I make something wrong ?
Does your ID list overlap the ID= part of column 9 or a different part? Please provide a better example as there is no overlap between your ID list and any GFF entry.
What is the output to:
Hi, This is the result
You should not get empty output. Are you sure you're running the command right? What is the output to:
It seems like this
That file was created on Windows, correct? Please run
dos2unix
on it to make it a proper Linux file, then try the grep from the answer.You were totally right, I change the format of txt created in windows and It worked. Thank you very much for your help!