I have 2 gff3 files. I want to extract genes with same gene lengths in both the files. If they are same, I want to write them to a third file
example:
file 1
C20775336 maker gene 1895 2166 . - . ID=gene1;Name=maker-C20
C20775336 maker gene 895 1166 . - . ID=mRNA1;Parent=gene1;N
C20775336 maker gene 1895 1962 . - . Parent=mRNA1
C20775336 maker gene 2795 2962 . - 2 ID=CDS1;Parent=mRNA1
file 2
C20775336 maker gene 895 1166 . - . ID=gene1;Name=maker-C20
C20775936 maker gene 1895 2166 . - . ID=mRNA1;Parent=gene1;N
C20775336 maker gene 2795 2962 . - . Parent=mRNA1
Output file
C20775336 maker gene 895 1166 . - . ID=gene1;Name=maker-C20
C20775336 maker gene 2795 2962 . - . Parent=mRNA1
So if columns 4 and 5 in first file is equal to column 4 and 5 in second file, print that line from second file to a new output file.
I tried to work on awk, to achieve this, but I could not figure out.
Thanks in advance
Joining Multiple Fields Using Unix Join (from SO) gives very nice solutions using awk and join. If I have to combine multiple files in unix I use what Michael Mrozek suggested - combine fields using awk and then join files according one key field with join.
I have same query also, but if i don't care about rest of first column, only first column should match and print number of match found in first file and number of match found in second file.
thanks
For quicker response you should ask this general programming question on SO.
Sorry...I did not get about SO
stackoverflow