Hello everyone!
I have to do something and I kind of lost. I have 2 tab delimited text files which contains exons coordinates.
The first file contains the start coordinates (for example):
NM_032291 chr1 + 66999638 67091529 67098752 67101626
NM_001308203 chr1 + 66999251 66999928 67091529 67098752 67105459 67108492
and the second file - contains the end coordinates:
NM_032291 chr1 + 67000051 67091593 67098777 67101698
NM_001308203 chr1 + 66999355 67000051 67091593 67098777 67105516 67108547
I'd like to have multiple bed files for each exon(for example for the first exon):
chr1 66999638 67000051 NM_032291 length +
chr1 66999251 66999355 NM_001308203 length +
Each gene contains different number of exons - so the number of the columns is unknown. I believe there is a very simple way to do it, I've tried awk but without success.
Thanks!
Thanks! About your question: that's my data.... But I have one more problem - I just gave an example - the number of exons for each gene is different.
You mean you have same ids multiple times in first column in a same file?
I edited my post, so you can see now.... And no, I have one Id per line and then all the exons coordinates
then the solution I provided should work fine. Did you get any error?
Maybe I didn't understand your answer... what does it mean $10? Isn't it simply column 10?
Yes, but after combining two files into one. I added explanation.