Question

How do I Download Metagenomics Data from NCBI or EBI with its Metadata?

0

Entering edit mode

12 months ago

John ▴ 10

I am currently working on a research project that requires analyzing metagenomics data. I want to download metagenomic sequence data using a command line such as Linux and the associated metadata of such data from a public repository National Center for Biotechnology Information (NCBI) or the European Bioinformatics Institute (EBI). Could anyone provide a step-by-step guide or recommend tools and methods for efficiently downloading this data?

I have tried downloading SRA projects from NCBI on a PC. However, I did it manually and it took me more time to download 200 samples. Later, I didn't know where to get the metadata of these samples downloaded. For these 2 reasons;

I would like to be guided on how I can batch-download such large files. Additionally, I would like to know how to access metagenomic datasets from NCBI's Sequence Read Archive (SRA) and EBI's European Nucleotide Archive (ENA).

Thank you in advance for your assistance!

metagenomics NCBI • 572 views

ADD COMMENT • link updated 12 months ago by Ram 45k • written 12 months ago by John ▴ 10

0

Entering edit mode

You can do this in multiple ways.

Use NCBI's SRA Run selector to get metadata (and get data delivered to Galaxy or to a cloud storage instance of your own) : https://www.ncbi.nlm.nih.gov/Traces/study/?acc=PRJNA111397&o=acc_s%3Aa

You could use the sra-explorer to identify and access download links for data. Here is a guide to use the program: sra-explorer : find SRA and FastQ download URLs in a couple of clicks

SRA projects from NCBI on a PC

SRA datasets can be very big. You are going to be limited by resources on your PC, if that is all you have access to. Storage and also the network connection.

NCBI's Sequence Read Archive (SRA) and EBI's European Nucleotide Archive (ENA).

In theory three main DNA sequence databases are synced overnight, so identical information should be available in all three. Though I had recently come across some examples where the data was only available in ENA (perhaps because of some restrictions, especially with human samples).

ADD REPLY • link 12 months ago by GenoMax 152k