Entering edit mode
15 months ago
emiliomastriani
▴
40
Dear all,
I need your help to solve the following problem. I have the following two files (indicated as A and B):
FILE A:
head 129N.final-test_taxid-120686.txt
A00270:507:H3KTJDSX5:4:1105:31747:1736 1187 chr1 205559197 60 144M6S = 205559203 152 GAGCATTTAGGCAAGAGAAAGGAACAAAGGGTATCCAAATTGAAAAACAGGAGTCAAATTGTCCCTTTGCAGACAACAGGATTTTACATATAGAAAAATCTAAAAGATCACACACACACACACACACACACACACACACACACACACAAA FFFFFFFFFFFFFFF:FFFFFFFFFFF:FF:FFFF,FFFFF:FFFF:F:FFFFFF:F:FFFFFF:FFFFFF:FF,FFFFFFFFFFFFFFFFFFFFFF::,FFFF::,FFF,F:FFFFFFFFFFFFFFFF,FFFFFFFFFF:,F,FF,FF: NM:i:0 AS:i:288 nn:i:0 tp:A:P cm:i:18 s1:i:120 s2:i:0 de:f:0 rl:i:86 MQ:i:50 MC:Z:102M4I44M ms:i:4398
A00270:507:H3KTJDSX5:4:1105:31747:1736 1107 chr1 205559203 50 102M4I44M = 205559197 -152 TTAGTCAAGAGAAAGTAACAAATTGTATAAAAATTGAAAAACAGGAGTCAAATTGTCCCTTTGCAGACAACAGTATTTTACATATATAAAAATATAAAATATCACACACACACACACACACACACACACACACACACACACAAACCTCTT ,FF:,,FF,F,:F:,,FF:,FF,,:,FFF,F:FF::FFFFF:F,F:::FFFFFFFF,::FF::,F,F:FFFFF,FF:FFF,FFFFF:FFFFF:,FF,FF,FF,FFFFFFFFFFFFFFFFFFF,FFFFFFFF:FFFFFFFFF:F,FFFFFF NM:i:14 AS:i:176 nn:i:0 tp:A:P cm:i:1 s1:i:120 s2:i:0 de:f:0.0748 rl:i:86 MQ:i:60 MC:Z:144M6S ms:i:5013
A00270:507:H3KTJDSX5:4:1338:10565:21731 1123 chr11 73519648 60 150M = 73519687 191 AAAAGAACAGCATGAGCAAAGTCACTGAAGGAGAAAAAAGCTTGACAGGCTCTCAGAGGAGCTGAGAGATGATGAGGATAGCTGGAACACAGAGAACAAGGGAGAGAGCAGCACAAAGACCAGAGCATGTATGTGTGTGTGTGTGTGCGT FFFFFFFFFFFFFFFFFF:FFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFF,FFFFFFF:F,FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF,FFFFF,F,F:FFFFFFFFFFFFFFFFFF NM:i:0 AS:i:300 nn:i:0 tp:A:P cm:i:20 s1:i:164 s2:i:0 de:f:0 rl:i:0 MQ:i:60 MC:Z:131M2D19M ms:i:4508
A00270:507:H3KTJDSX5:4:1338:10565:21731 1171 chr11 73519687 60 131M2D19M = 73519648 -191 GCTTGACAGGCTCTCAGATGAGCTGAGAGATGATGACGATAGCTGGAACACAGAGAACCAGGGAGAGAGCAGCACACAGACCAGAGCATGTCTGTGTGTGTGTGTGTGCGTGTGCGTTTGTGTGTGTATACATATATATGTATATTTTTT F,:,::,F::FFFF,FF,,F,,F:FFFFF,FF:FF,,::,:FFF,:FFFF:FFF::F,,,F:FFFFFFFFFFFF,F,FFFFFFFF,F:F:F,FFFFFFFFFFFFFF:F:FFFFFFFFFFFFFFFF,FF,FFFFFFFFFFFFFFFFFFFFF NM:i:7 AS:i:234 nn:i:0 tp:A:P cm:i:5 s1:i:164 s2:i:0 de:f:0.0397 rl:i:0 MQ:i:60 MC:Z:150M ms:i:5305
FILE B:
head 129C_129N_NA_NA-result.txt
"chr" "window_start" "window_end" "CpG_density" "gene body" "exon" "intron" "TSS" "TFBS" "CpG Island" "TvN_log2FC" "TvN_pvalue" "TvN_adjPval" "Normal_beta_means" "Tumor_beta_means"
"1" "chr1" 598225 598636 18.4427890754089 "" "" "" "" "" "" 4.3611988195503 1.3571329343012e-06 0.000213095969395564 NA NA
"2" "chr1" 798869 799280 1.94117546441723 "" "" "" "" "" "" 3.33729293030521 0.0028193049869542 0.00603736189514813 NA NA
"3" "chr1" 964493 964904 15.8745202346568 "" "" "" "" "" "" -4.03939052576444 9.65027705808557e-05 0.00139164913571352 NA NA
"4" "chr1" 1289973 1290384 13.2691422964244 "" "" "" "" "" "" 2.24668352741542 0.000147522985565905 0.00147072671202151 NA NA
"5" "chr1" 1452713 1453124 5.52888052862558 "" "" "" "" "" "" 3.2958343457078 0.00326925560284505 0.00654091448106587 NA NA
"6" "chr1" 1484025 1484436 9.01501605142477 "" "" "" "" "" "" -2.91774020058587 0.0012539611860937 0.0037948825368625 NA NA
"7" "chr1" 1543353 1543764 9.72593333917879 "" "" "" "" "" "" -4.58247376587071 8.67234633703419e-07 0.000157092251623494 NA NA
"8" "chr1" 1673545 1673956 6.58667962552194 "" "" "" "" "" "" -3.71387653352646 0.000659203327330426 0.00295087454366937 NA NA
"9" "chr1" 1882017 1882428 3.73642500628685 "" "" "" "" "" "" -3.9364144585912 0.00013094999836789 0.00139164913571352 NA NA
I need to execute the next loop to select the fields respecting the following conditions:
for every line in A do
for every line in B do
if ($3_A == $2_B) ##belong to the same chr
if ($3_B <= $4_A <= $4_B) ##the read into the window
print($3_A, $4_A)
I already tried some solution like that I report, but without the expected results:
awk 'FNR==NR{pimga_char[$3]; pimga_num[$4]; next}; ($2 in pimga_char) && ($3 < pimga_num) { print $1}' A.txt B.txt
Please, any kind of support will be appreciated!!!
Thank you
Emilio
Switch to python or R,this will get a lot easier. Also, I've formatted your post so it looks better. Please invest some time yourself into formatting your posts so they are easier to read.
Thank you for the formatting support. Please, can you suggest the code (R/Python) to approach my problem? Something I can use as a starting example will be enough. Thank you
R's dplyr will serve well here.