Entering edit mode
3.6 years ago
ran
•
0
Hi,
I'm pretty new to programming and I'm trying to find matching genes between young and old samples, order them in columns and write it to txt file. This is how the young gene file look like:
GENE Y1 Y2 Y3 Y4 Y5 Y6 Y7 Y8 Y9
DPM1 4.85 NA NA NA NA 5.35 5.52 4.6 4.83
SCYL3 4.2 4.54 5.16 5.1 4.61 4.89 5.03 4.09 4.5
C1orf112 3.24 3.03 3.9 4.29 3.58 4.96 4.03 3.6 3.72
FUCA2 3.83 NA NA NA 4.92 3.55 5.76 4.98 5.78
GCLC 5.31 4.66 5.18 3.94 5.25 4.43 5.75 6.56 5.69
the old one:
GENE O1 O2 O3 O4 O5 O6 O7 O8 O9
DPM1 3.92 3.84 3.98 4.06 4.16 3.84 3.88 3.96 3.75
DUFAB1 5.3 5.36 5.29 5.37 5.37 5.53 5.57 5.36 5.39
DVL2 4.47 4.71 4.72 4.95 5.01 4.85 4.61 4.79 4.38
DYRK4 3.2 2.84 3.07 2.4 2.17 1.98 3.23 2.81 3.19
the output should be:
GENE Y1 Y2 Y3 Y4 Y5 Y6 Y7 Y8 Y9 O1 O2 O3 O4 O5 O6 O7 O8 O9
DPM1 4.85 NA NA NA NA 5.35 5.52 4.6 4.83 3.92 3.84 3.98 4.06 4.16 3.84 3.88 3.96 3.75
this is my try:
with open ("youngMatrix.txt", 'r+') as young, open("oldMatrix.txt", 'r+') as old:
with open ("CombMatrix.txt", "w") as Comb_file:
for line_old in old:
for line_young in young:
line_young1 = line_young.split("\t")
line_old1 = line_old.split("\t")
if line_old1[0] == line_young1[0]:
edit_old1 = line_old.rstrip("\n")
edit_young1 =line_young.rstrip("\n")
united_file.writelines(edit_young1 + edit_old1 + "\n")
and my output is this
GENE Y1 Y2 Y3 Y4 Y5 Y6 Y7 Y8 Y9GENE O1 O2 O3 O4 O5 O6 O7 O8 O9
I'm pretty stuck and will appreciate any help!
Get yourself acquainted with the
pandas
package, and then take a look at how to perform an inner join with twopandas
dataframes
. That will solve your core problem. Figuring out how to write adataframe
to a text file is only a search engine query away. You got this!!with tsv-utils: