Entering edit mode
9.2 years ago
M K
▴
660
Dear Biostars,
I have a text file contains many columns, the first column represents the repetitive DNA names with their strand and the rest of the other columns representing the gene names shard the same position with these repeats as shown below. My question here is how to manipulate this file by putting the gene names at the first column and the other columns contain the repetitive DNA names that sharing the same position with this gene.
(A)n__- Dpp10 Xkr4 Mgat4a Ikzf2 Zfp142 Kif1a Tmcc2 Pou2f1 Pbx1 Fbxo28 Hhat Gm26901 Snhg6 Snord87 Tram1 Trpa1 Tram2 Lman2l Snord89 Tex30 Myo1b Pms1 Hsfy2 Clk1 Orc2
(A)n__+ Itpkb Tfap2d Nyap2 Sag Ccdc93 Rc3h1 Aim2 Esrrg Rb1cc1 St18 AC121538.1 Tfap2b Khdc1a Khdc1c Imp4 Cnnm4 Mstn mmu-mir-7681 Fzd7 Nop58 Gm11602 Apol7d Bcs1l Ttll4 Gm21972
(ACTG)n__- Bsnd 2900026A02Rik 6330408A02Rik Atp2a1 Rhbdf2
(ACTG)n__+ Bpifb3 Gm16215 Trmt112-ps2 Calu Ghrhr Lig1 Gm22535 Podnl1 Gm16217 Sdr9c7 Slit3 Fndc9
(AGCTG)n__+ Gm25033 Gm22121 Gm22617 Gm2274 Gal3st3
(AGGGGG)n__- Gm5532 Pbx3 Dgkz Zbp1 Lrrc34 4930503B20Rik Padi3 Crygn Cnot6l Gm15498 Rarres2 Gm4604 n-R5s165 Gm3912 2810047C21Rik1 Gm3654 Gm20482 Zfp27 Slco2b1 Adam32 Gm16793 Slc22a14 Gm4779 Myocd Kdm6b
(AGGGGG)n__+ Gm24901 Pik3c2b Frmd4a Sox2ot Gm24830 Tpt1-ps1 Abca4 Gm13032 Gm1673 Rhoh Mafk Grm7 mmu-mir-7668 Ppp2cb 8030474K03Rik Lama4 Ankrd36 Igtp Irgm2 Gm12949 Tmem256 Gm24877 Mllt6 Rian Gm17309
(ATG)n__- Gm14264 4930533B01Rik Arhgef10l Gm22983 Svop Gm7887 Cecr5 9630033F20Rik Gm27013 Gm10396 Hpn Polg P2ry6 BC051019 Gm24581 Efnb2 Ubash3b Gm8907 Tmem30a Gm14570 Gm24622 Gm23122 Myf5 BC006965 Olfr331
(ATG)n__+ Aox2 Stradb Eif2d Tpr Igsf8 Sh2d3c Ypel4 Chrm4 Gm26421 Slc24a3 Nsfl1c Gm14270 Fgg Pias3 Zcchc11 Gm11876 Fam114a1 Lias Gm25374 Sds Rasal1 Grid2ip Ccdc132 Gpnmb Gm2115
For example the first gene name in this file is Dpp10 and I want to find all repetitive DNA names for it.
Dear Pierre, thanks for responding me, but I don't have any idea about sqlite3. so is there any way to do that using only R.