Question

How to retrieve sample informations from given ID from Sequence Read Archives?

2

Entering edit mode

22 months ago

DareDevil ★ 4.4k

I have a list of SRA id (around 1000) from NCBI SRA database.

"SRX1067067" ,"SRX022566", "SRX11222414", "SRX11222415", "SRX11222416", "SRX11222417", "SRX11222418", "SRX11222419", "SRX176057", "SRX176058"

I want to extract the information of all sample ids as follows:

output

eutils SRA • 761 views

ADD COMMENT • link updated 20 months ago by Ram 45k • written 22 months ago by DareDevil ★ 4.4k

Ram · Accepted Answer · 2023-09-19

import requests
import xml.etree.ElementTree as ET
import pandas as pd

# List of SRX accessions
srx_accessions = ["SRX1067067", "SRX022566", "SRX11222414", "SRX11222415"]

# Initialize an empty DataFrame
df = pd.DataFrame(columns=["ID", "Study Title", "Experiment Title"])

# Loop through each SRX accession
for srx_accession in srx_accessions:
    url = f"https://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=sra&id={srx_accession}&retmode=xml"
    response = requests.get(url)

    # Check if the request was successful
    if response.status_code == 200:
        xml_data = response.text
        root = ET.fromstring(xml_data)

        # Find the STUDY_TITLE and TITLE elements
        study_title = root.find(".//STUDY_TITLE").text
        title = root.find(".//TITLE").text

        # Append data to the DataFrame
        data = {"ID": [srx_accession], "Study Title": [study_title], "Experiment Title": [title]}
        temp_df = pd.DataFrame(data)
        df = pd.concat([df, temp_df], ignore_index=True)
    else:
        print(f"Failed to retrieve data for {srx_accession}. Status code: {response.status_code}")

# Write the DataFrame to a local CSV file
df.to_csv("srx_info.csv", index=False)

# Display the DataFrame
print(df)