loop over the list of csv files

Question

Pandas issue: To write multiple csv in loop

0

Entering edit mode

3.1 years ago

Nai ▴ 50

I have table.csv with plant varieties linked with SNP Markers(400). One marker is occuring in 20 varities. I would like to make a separate csv file for each marker which should have variety and other columns. I am new in python and trying to use Pandas. I wrote the following:

import os
from io import open
import pandas as pd
dfs = pd.read_csv('/home/System/Variety_Marker.csv', sep='\t', encoding='latin-1', low_memory=False)

for i in dfs.groupby('MARKER'):
    tables = i 
    df = pd.DataFrame(i) # tuple change into dataframe
    df.to_csv(f"/home/System/table_{i}.csv")

OSError: [Errno 36] File name too long:

The commands are taking the values , headers and all information as naming csv file.I am thankful in advance. Please help to resolve this issue..

Pandas Python R • 5.2k views

ADD COMMENT • link 3.1 years ago by Nai ▴ 50

0

Entering edit mode

Try to print the results of f"/home/System/table_{i}.csv" and see filename its trying to write to.

ADD REPLY • link 3.1 years ago by Asaf 10k

0

Entering edit mode

Why is this tagged with R?

ADD REPLY • link 3.1 years ago by Joe 22k

0

Entering edit mode

If I can get solution in R too

ADD REPLY • link 3.1 years ago by Nai ▴ 50

score 4 · Accepted Answer · 2022-06-09

4

Entering edit mode

3.1 years ago

massa.kassa.sc3na ▴ 650

Hi, the main problem is that you are passing i (which is a tuple of MARKER and pd.DataFrame) to the format string. The f"{i}" returns string representation of the tuple, which includes part of your data. This is why you're getting File name too long.

What you should do:

for i, df in dfs.groupby('MARKER'):
    df.to_csv(f"/home/System/table_{i}.csv")

ADD COMMENT • link 3.1 years ago by massa.kassa.sc3na ▴ 650

0

Entering edit mode

Thank you Massa. I would like to know about df variable in for loop. I created multiple csv files. Now I would like read all csv. I have done by:

path='/home/System/PCA_1/' filenames = glob.glob(path + "/*.csv")

loop over the list of csv files

for f in filenames:

    # read the csv file
df_1 = pd.read_csv(f, sep=';')

if df1.groupby("VARIETY").get_value("C61") OR .groupby("AGE").get_value("10"):

         df.to_csv(f"/home/System/Marker_filter.csv")

I would like to mention the conditions on 5 columns in each file separately and make new file MARKER_filter.csv. I am not getting if statement on multiple columns in multiple csv file.

ADD REPLY • link 3.1 years ago by Nai ▴ 50