Entering edit mode
6.3 years ago
1234anjalianjali1234
▴
50
Hellow,
I am finding the gene duplication event within genome. For this, I have to have the positional information of CDS of those genes.
The problem is i need only uniq ID with its positional information.
My file:
st1 PGSC0003DMC400026563 152418 152576
st1 PGSC0003DMC400026561 160499 160663
st1 PGSC0003DMC400039465 225140 225225
st1 PGSC0003DMC400039465 225786 225990
st1 PGSC0003DMC400039465 226430 226630
st1 PGSC0003DMC400039465 227247 227461
st1 PGSC0003DMC400039465 228093 228346
st1 PGSC0003DMC400039465 228815 228867
st1 PGSC0003DMC400039465 228960 229439
st1 PGSC0003DMC400039540 249208 249402
What I want:
st1 PGSC0003DMC400026563 152418 152576
st1 PGSC0003DMC400026561 160499 160663
st1 PGSC0003DMC400039465 225140 229439
st1 PGSC0003DMC400039540 249208 249402
Thankyou.
Is your file always sorted like that? That is, are the IDs always grouped together and the coordinates sorted?
No, I have sorted my original GFF file using awk command.
What have you tried? You have a clear idea of what you want, so you must have made some headway into getting there, right?
Can I also add that the tag 'gene duplication' is misplaced here. Those are not gene duplications but exons (CDS) of a single gene. So you just want the beginning and end coordinate of each gene, rather than the separate exons.
Yes, I know that they are not duplicated genes. I am trying to find gene duplication for which I need to make gff file, and for that I have to make a file of CDS with coordinates. You are right, I want start and end coordinate of a CDS.
Thankyou
Please aim for professional communication -
would be:
Yes, I know that they are not duplicated genes. I am trying to find gene duplication for which I need to make gff file, and for that I have to make a file of cds with coordinates. And you are right, I want start and end coordinate of a CDS.
Thank You