Hello,
I'm trying to map id to gene name using the gff3 file. I've been searching a lot for this topic but none of them were exactly what I'm looking for (or tools just didn't work well). Could anyone help me on this? Here's an example gff3 annotation that I have. I replaced 'tab' to ' | ' for better understanding :
chr18 | SE | gene | 25175343 | 25203976 | . | + | . | Name=chr18:25175343:25175485:+@chr18:25182055:25182149:+@chr18:25203861:25203976:+;gid=chr18:25175343:25175485:+@chr18:25182055:25182149:+@chr18:25203861:25203976:+;refseq_id=NA;ensg_id=ENSMUSG00000033632;gsymbol=AW554918;ID=chr18:25175343:25175485:+@chr18:25182055:25182149:+@chr18:25203861:25203976:+
chr18 | SE | mRNA | 25175343 | 25203976 | . | + | . | gid=chr18:25175343:25175485:+@chr18:25182055:25182149:+@chr18:25203861:25203976:+;ID=chr18:25175343:25175485:+@chr18:25182055:25182149:+@chr18:25203861:25203976:+.A;Parent=chr18:25175343:25175485:+@chr18:25182055:25182149:+@chr18:25203861:25203976:+
Also, I have another file with gid, and would like to map the gid to gsymbol, for example. Here's how the file look like :
event_name | chrom | strand | mRNA_starts | mRNA_ends
chr18:25175343:25175485:+@chr18:25182055:25182149:+@chr18:25203861:25203976:+ | chr18 | + | 25175343,25175343 | 25203976,25203976
Then, I would like to map gid (chr18:25175343:25175485:+@chr18:25182055:25182149:+@chr18:25203861:25203976:+) to gff3 file, and print out with gsymbol (AW554918). The output I'm trying to get is something like :
event_name | chrom | strand | mRNA_starts | mRNA_ends | gsymbol
chr18:25175343:25175485:+@chr18:25182055:25182149:+@chr18:25203861:25203976:+ | chr18 | + | 25175343,25175343 | 25203976,25203976 | AW554918
I think I might want to parse the attributes in gff3 file, and map gid in the second file to gsymbol. Could you help me how I can parse the attributes to multiple columns? pyhton or R would be a bit better for me to understand. Or, is there a simpler way to do this? Any suggestions to solve this problem will be really appreciated. Thank you.
Note: I realize that you could probably do this same thing using some more lightweight text chopping tools but hopefully using the actual gff3 parser and nodejs is not too much of an added pain point.
Thanks cmdcolin! Good to know about this tool, and your example script helped me a lot! :)))
Sure thing. Also note that the unix command join requires sorted text files, I missed that step in my explanation. The join command is great for combining two different text files based on columns but requires that they are sorted or the output may be incomplete
Very helpful. Thanks a lot!!
Sure thing. Also note that the unix command join requires sorted text files, I missed that step in my explanation. The join command is great for combining two different text files based on columns but requires that they are sorted or the output may be incomplete