EDITED POST: This post has been significantly re-edited from original post, but question remains the same between versions.
I have performed RBH for each of five transcriptomes against a reference genome, producing five separate output files. I figured, for ease, it was best to combine these output files into one using cat
command. The output looks like so, e.g.,:
Acur_1001484 1751 OFAS000562-RA-EXON01 467 1 1039 1240 1 467 264 4.00E-62 237 87.75 179 204 1 264 467 1 1240 1039 1.00E-62 237 87.75 179 204
Apil_comp17655_c0_seq1 2446 OFAS000562-RA-EXON01 467 1 288 453 1 289 454 2.00E-66 252 93.98 156 166 1 289 454 1 288 453 1.00E-66 252 93.98 156 166
Atri_comp16713_c0_seq1 2069 OFAS000562-RA-EXON01 467 1 1576 1754 1 467 289 5.00E-67 254 92.18 165 179 1 289 467 1 1754 1576 4.00E-67 254 92.18 165 179
Btri_comp12490_c0_seq1 2547 OFAS000562-RA-EXON01 467 1 2069 2267 1 467 268 2.00E-66 252 89.5 179 200 1 268 467 1 2267 2069 1.00E-66 252 89.5 179 200
Atri_comp10477_c0_seq1 2608 OFAS000604-RA-EXON01 1887 1 4 1159 1 298 1537 0 701 78.5 997 1270 1 298 1537 1 4 1159 0 701 78.54 999 1272
Ctom_1008131 415 OFAS000604-RA-EXON01 1887 1 101 399 1 1260 959 1.00E-81 300 85.06 262 308 1 959 1260 1 399 101 4.00E-81 300 85.06 262 308
Acur_1001075 997 OFAS000604-RA-EXON01 1887 1 469 954 1 756 1258 1.00E-125 448 83.69 431 515 1 247 650 1 16 417 1.00E-121 435 86.54 360 416
Btri_comp18753_c0_seq1 2786 OFAS000604-RA-EXON01 1887 1 420 1181 1 761 1593 1.00E-108 392 77.03 654 849 1 297 650 1 4 336 1.00E-87 324 84.49 305 361
Apil_comp16297_c0_seq1 1235 OFAS000628-RA-EXON01 453 1 182 553 1 372 1 4.00E-66 250 78.93 296 375 1 1 372 1 553 182 4.00E-66 250 78.93 296 375
Atri_comp15731_c0_seq1 657 OFAS000628-RA-EXON01 453 1 131 517 1 387 1 9.00E-55 211 76.73 300 391 1 1 387 1 517 131 2.00E-54 211 76.73 300 391
Apil_comp14961_c1_seq1 1230 OFAS000635-RA-EXON02 212 1 254 452 1 1 198 2.00E-54 211 85.93 171 199 1 1 198 1 254 452 8.00E-55 211 85.93 171 199
For a given reference gene (e.g., OFAS000562-RA-EXON01), I would now like to retrieve the query IDs and the corresponding reference gene ID and copy this data into a new text file named as the reference gene ID (e.g., OFAS000562-RA-EXON01_IDs.txt). Below are four examples of what I'm wanting in terms of output from this approach:
File 1:
OFAS000562-RA-EXON01
Acur_1001484
Apil_comp17655_c0_seq1
Atri_comp16713_c0_seq1
Btri_comp12490_c0_seq1
File 2:
OFAS000604-RA-EXON01
Acur_1001075
Btri_comp18753_c0_seq1
Atri_comp10477_c0_seq1
Ctom_1008131
File 3:
OFAS000628-RA-EXON01
Apil_comp16297_c0_seq1
Atri_comp15731_c0_seq1
File 4:
OFAS000635-RA-EXON02
Apil_comp14961_c1_seq1
I don't have strong bioinformatic/scripting skills, so I appreciate any feedback on how to do this.
If I understand, I would need a list of the reference genes from my RBH output (that is OFAS.....), which I would then need to replace 'list' with the file name?
I tried the first suggestion, and it produces a file with the reference gene ID, but not any of the query gene IDs that correspond to the reference gene ID
I've also tried the last suggestion (list) and I get the following errors:
line 2: unexpected EOF while looking for matching `"' line 4: syntax error: unexpected end of file
My file didn't even have a line 4