Question

Downloading Raw Reads

0

Entering edit mode

7 months ago

Jamie • 0

Hello!

I am trying to do a computational biology project for my school’s science fair and I want to download raw sequencing reads off the SRA database. However these reads are a lot larger than I thought and I’m worried what will happen if my computer runs out of space. How many samples should I have in general for each independent variable? The study has 99 different samples. Should I just buy an external hardrive or something and download all of the files, or could I only use, perhaps, 50 of the 99 samples?

For context I have a MacBook Pro with 1 TB of storage and this is the data that I want to use for my project: https://www.ncbi.nlm.nih.gov/bioproject/PRJNA879084

I’m new to computational biology, so any suggestions would be greatly appreciated!

reads RNA • 518 views

ADD COMMENT • link updated 7 months ago by colindaven 7.6k • written 7 months ago by Jamie • 0

0

Entering edit mode

Another problem - even if you buy a big external SSD (not a hard disk) is that your computer likely does not have enough RAM to align the sequences to the genome. From memory an aligner like STAR can use over 32 GB of RAM, HiSat2 is likely more efficient. Another more resource efficient route would be to look at Kallisto or Salmon (both on github). But getting the count table as ATpoint says is likely the best option.

ADD REPLY • link 7 months ago by colindaven 7.6k

score 3 · Answer 1 · 2024-10-22

3

Entering edit mode

7 months ago

ATpoint 88k

I assume you eventually want counts? Make your life easy and use the counts provided at https://www.ncbi.nlm.nih.gov/geo/download/?acc=GSE213092 rather than processing data yourself from scratch, that imo goes beyond a school project.

ADD COMMENT • link 7 months ago by ATpoint 88k