Hi Alex Reynolds,
I am sorry to ask this question once again. Because I got some error from previous command line.
My first file looks like: First.txt
deletion chr10:1726501-1755000 28500 0.586226 9.73037E-05 715.754 0.00171548 3546.87 0.114216 1.17241
Second file looks like: Second.txt
Chr10 NC_029525.1 gene 1672245 1676954 - LOC107318572
Chr10 NC_029525.1 gene 1677076 1682931 - C10H15orf39
Chr10 NC_029525.1 gene 1690899 1710413 - PPCDC
Chr10 NC_029525.1 gene 1710723 1714472 - LOC107318577
Chr10 NC_029525.1 gene 1714558 1714977 - LOC107318579
Chr10 NC_029525.1 gene 1717116 1719122 + RPP25
Chr10 NC_029525.1 gene 1721742 1725395 + LOC107318578
Chr10 NC_029525.1 gene 1725935 1728167 + FAM219B
Chr10 NC_029525.1 gene 1728336 1731151 - MPI
Chr10 NC_029525.1 gene 1731194 1739576 + LOC107318570
Chr10 NC_029525.1 gene 1739821 1743801 + ULK3
Chr10 NC_029525.1 gene 1744568 1747749 - CPLX3
Chr10 NC_029525.1 gene 1752411 1759515 - CSK
I want to get result like this: Asnwer.txt
Chr10 NC_029525.1 gene 1725935 1728167 + FAM219B
Chr10 NC_029525.1 gene 1728336 1731151 - MPI
Chr10 NC_029525.1 gene 1731194 1739576 + LOC107318570
Chr10 NC_029525.1 gene 1739821 1743801 + ULK3
Chr10 NC_029525.1 gene 1744568 1747749 - CPLX3
Chr10 NC_029525.1 gene 1752411 1759515 - CSK
Thank you Alex for quick reply. I should say it worked partially because it is missing some genes list from chromosomes like chrZ, chrM, LGE64 and LGE22C19W28_E50C23.
Yep, that's a bug. See the revised preprocessing step for the second file.
I am able to get chrZ but chromosomes chrW, chrM, LGE64 and LGE22C19W28_E50C23 are still missing.
I added a fix to test for the chromosome name in the second preprocess step. If the name starts with
Chr
then the first letter is made lowercase. Otherwise, the name is left unmodified.You'll need to investigate further on your own, from this point out, as you may need to preprocess your first file, as well, depending on how you have named chromosomes in that file.