Question

Python table data extraction

0

Entering edit mode

3.2 years ago

deniselavezzari • 0

Hi!

I'm working a table like this:

enter image description here

and I need to access each one of the "GenBank Accessions" number, compare it to the second table "Info".

enter image description here

Then, for each row in the first table I want to add the count of the specific species. For example, in the first row of the table one I would have

Enterovirus A: 4 
Enterovirus G: 2

How can I do that?

Thanks a lot!

python • 1.4k views

ADD COMMENT • link updated 2.6 years ago by Ram 45k • written 3.2 years ago by deniselavezzari • 0

0

Entering edit mode

What have you tried? Have you Googled anything?

ADD REPLY • link 3.2 years ago by Ram 45k

0

Entering edit mode

Hi!

I'm trying for-loops, but only the first element of each row is exactly compared to the second table.

Maybe I need to do the opposite, it might be easier...

import pandas as pd
import time

file = pd.read_csv('full_DB.csv')

info = pd.read_csv('info.csv')

file['Sample'] = file['Sample'].str.replace('accn\|', '')
file.rename(columns={'Sample':'GenBank Accessions'}, inplace=True)

info_sub = info[['Species','GenBank Accessions']]

lista = info['Species'].unique().tolist()

for i in lista:
    file[i] = ""

subset = file.iloc[0:5,:]

for row in subset.index:
    t = subset['GenBank Accessions'][row]
    arr = t.split(',')
    for j in range(len(arr)):
        print(info_sub[info_sub['GenBank Accessions'] == arr[j]])

ADD REPLY • link updated 2.6 years ago by Ram 45k • written 3.2 years ago by deniselavezzari • 0

0

Entering edit mode

I've undeleted your question as it has already received some feedback from a couple of users.

Your description of the data is minimal and confusing, so we can't really help with the code. Can you explain what you wish to achieve in plain terms such as :

Split column X from table Y using "," as the delimiter
Match each value in the split column to column Z from table YY

etc.

ADD REPLY • link 3.2 years ago by Ram 45k

0

Entering edit mode

Are these real tables or CVS data files? You can use Python pandas library for this data merge. Create a dataframe for each table, loop through them, position the required value in table 1 and find the other value in table 2, and so on. At end build a new CSV file with the found desired values. In Machine Learning this is call data preprocessing or Exploratory Data Analysis (EDA). We do that all the time in any Machine Learning project. I hope this clarifies your task.

ADD REPLY • link 3.2 years ago by Ernest Bonat ▴ 30

0

Entering edit mode

This does not answer OP's question - it's just a detour into how the genre of OP's question is common among ML projects. As such, I've moved it to a comment.

Data cleaning is common in any data analysis project, not just ML projects. Please stop pushing ML everywhere.

ADD REPLY • link 3.2 years ago by Ram 45k