Question

Concatenating text files based on common indices

0

Entering edit mode

19 months ago

Hau Tak Leighton • 0

I have two text-files containing the abundances of genes in samples. However, one of the files measures the abundance of a greater variety of genes than the other, and therefore cannot be completely concatenated. Thus, I'm trying to concatenate the lines from the larger file that share an identical gene index as lines from the smaller file, such that:

  >df1
                Sample1 Sample2 Sample3  
   Gene1    0.001       0.002      0.003
   Gene2    0.001       0.002      0.003  
   Gene3    0.001       0.002      0.003 


  >df2
                Sample4 Sample5 Sample6  
   Gene1    0.001       0.002      0.003
   Gene1.1 0.001       0.002      0.003
   Gene2    0.001       0.002      0.003
   Gene2.1 0.001       0.002      0.003
   Gene3    0.001       0.002      0.003 


    >df1and2
                Sample1 Sample2 Sample3 Sample4 Sample5 Sample6
   Gene1    0.001       0.002      0.003.   0.001       0.002      0.003
   Gene2    0.001       0.002      0.003    0.001       0.002      0.003
   Gene3    0.001       0.002      0.003    0.001       0.002      0.003

Suggestions in Python or Bash are both welcome. Thank you!

Bash Python • 867 views

ADD COMMENT • link updated 19 months ago by Ram 44k • written 19 months ago by Hau Tak Leighton • 0

0

Entering edit mode

I've removed tags such as genetics, genes and bioinformatics. The last tag makes no sense - EVERY QUESTION here is related to bioinformatics.

ADD REPLY • link 19 months ago by Ram 44k

score 1 · Answer 1 · 2023-05-31

1

Entering edit mode

19 months ago

rpolicastro 13k

In general this is called an inner join, which is easy using the pandas library in Python.

import pandas as pd

df1and2 = df1.merge(df2, how='inner', left_index=True, right_index=True)

ADD COMMENT • link 19 months ago by rpolicastro 13k

0

Entering edit mode

Thanks for your response! However, when I tried it with my text files, I get an error stating: AttributeError: 'str' object has no attribute 'merge', even though they are in a very similar format to my example. What could be causing this problem?

ADD REPLY • link 19 months ago by Hau Tak Leighton • 0

2

Entering edit mode

'str' object has no attribute 'merge'

It appears you are trying to merge file names rather than dataframes. You have to read in those files first such that dataframes are named df1 and df2 and then it should work.