Hi,
I have two files with file1 having coordinates of genes and file2 containing list of genes. How can I extract genes lists from file2 using coordinates from file1?
File1:
**CNV_type Coordinates size val1 val2 val3 val4 val5 val6 cp**
deletion chr10:1726501-1755000 28500 0.586226 9.73037E-05 715.754 0.00171548 3546.87 0.114216 1.17241
File2:
Chr10 NC_029525.1 gene 1672245 1676954 - LOC107318572
Chr10 NC_029525.1 gene 1677076 1682931 - C10H15orf39
Chr10 NC_029525.1 gene 1690899 1710413 - PPCDC
Chr10 NC_029525.1 gene 1710723 1714472 - LOC107318577
Chr10 NC_029525.1 gene 1714558 1714977 - LOC107318579
Chr10 NC_029525.1 gene 1717116 1719122 + RPP25
Chr10 NC_029525.1 gene 1721742 1725395 + LOC107318578
Chr10 NC_029525.1 gene 1725935 1728167 + FAM219B
Chr10 NC_029525.1 gene 1728336 1731151 - MPI
Chr10 NC_029525.1 gene 1731194 1739576 + LOC107318570
Chr10 NC_029525.1 gene 1739821 1743801 + ULK3
Chr10 NC_029525.1 gene 1744568 1747749 - CPLX3
Chr10 NC_029525.1 gene 1752411 1759515 - CSK
Chr10 NC_029525.1 gene 1763792 1766892 - LOC107318670
Chr10 NC_029525.1 gene 1768556 1772353 + LOC107318671
Chr10 NC_029525.1 gene 1773117 1795190 + EDC3
Chr10 NC_029525.1 gene 1795424 1803058 - CLK3
Chr10 NC_029525.1 gene 1803181 1830203 - ARID3B
Chr10 NC_029525.1 gene 1830313 1832341 + LOC107318625
Chr10 NC_029525.1 gene 1832912 1837647 + UBL7
Chr10 NC_029525.1 gene 1837777 1868463 + SEMA7A
Chr10 NC_029525.1 gene 1871806 1875769 + LOC107318663
Chr10 NC_029525.1 gene 1875800 1895830 - CCDC33
Output/Result:
Chr10 NC_029525.1 gene 1725935 1728167 + FAM219B
Chr10 NC_029525.1 gene 1728336 1731151 - MPI
Chr10 NC_029525.1 gene 1731194 1739576 + LOC107318570
Chr10 NC_029525.1 gene 1739821 1743801 + ULK3
Chr10 NC_029525.1 gene 1744568 1747749 - CPLX3
Chr10 NC_029525.1 gene 1752411 1759515 - CSK
I will appreciate your help. Thanks.
I removed headers of both of the files using sed command:
sed '1d' file1.txt
File1 becomes like this:And file2 becomes:
output from command line:
tail -n+2 first.txt | cut -f2 | awk -v OFS="\t" -F'[:-]' '{ print $1, $2, $3; }' | sort-bed - > first.bed
isoutput from command line:
awk -v OFS="\t" '{ print $1, $4, $5, $2, $3, $6, $7; }' second.txt > second.bed
isBut when I ran final command line:
The answer file has nothing.
The formats of these files are different than what you posted originally.
To answer your question based off of these inputs, you could convert the first file to sorted BED via:
Convert the second file:
Then you can run
bedops
and permute columns:Check the files at every step. Especially if your input format changes, because that changes the behavior of tools like
awk
andbedops
, etc.Thank You Alex Reynolds! I worked now. I appreciate your help.
Please use
ADD REPLY/ADD COMMENT
when responding to existing posts to keep threads logically formatted. This belongs under @Alex's answer.