Hi everyone,
I am trying to download a genome in the fastq format, but can only access the fasta format to this point. I know that I can use the SRAtoolkit to convert SRA format to fastq, but I'm not sure which genome to choose, or even if these are entire genomes.
For example, when I search "Calypte anna", for SRA-->DNA-->genomes, I get the options below...are these all good options? My end-goal is to incorporate this Calypte anna genome into my dataset (genomes of other species) in a single VCF file.
Search results Items: 14 Filters activated: DNA, genome. Clear all to show 16 items. Select item 1649833 1. Anna's Hummingbird 17kb cut on Blue Pippin and 110 pM loading concentration
10 PACBIO_SMRT (PacBio RS II) runs: 1.6M spots, 26.3G bases, 87.5Gb downloads
Accession: SRX1131887 Select item 1648286
- Anna's Hummingbird 17kb cut on Blue Pippin and 125 pM loading concentration
52 PACBIO_SMRT (PacBio RS II) runs: 8.5M spots, 122.6G bases, 412.3Gb downloads
Accession: SRX1130526 Select item 1648285
- Anna's Hummingbird 17kb cut on Blue Pippin and 100 pM loading concentration
1 PACBIO_SMRT (PacBio RS II) run: 163,482 spots, 1.4G bases, 4.8Gb downloads
Accession: SRX1130525 Select item 456853
- `BGI-FCB06AHABXX-110603-L3-N300
1 ILLUMINA (Illumina HiSeq 2000) run: 103.9M spots, 10.2G bases, 5.7Gb downloads
Accession: SRX327908 Select item 456852 5. BGI-FCB066MABXX-110618-L2-N300
1 ILLUMINA (Illumina HiSeq 2000) run: 102.8M spots, 10.1G bases, 5.2Gb downloads
Accession: SRX327907 Select item 456851 6. BGI-FCB05B5ABXX-110525-L6-N300
1 ILLUMINA (Illumina HiSeq 2000) run: 101.6M spots, 10G bases, 5.2Gb downloads
Accession: SRX327906
If you search on sra-explorer using the search term
"Calypte anna"[Organism] OR Calypte anna[All Fields]
you will get 76 results. Looks like you have data from genome, large fragments purified by pippin prep and sequenced on RSII, transcriptome etc. All these are obviously raw sequence datasets but they represent a good variety.If you want to get assembled genomes then they are available here where someone has already done the assembly of the genome.
I am not sure what you mean by that. Do you want to align your data against the Anna genome (ref assembly above) or align raw Anna data against your own genome to create VCF's?
You can download fastq files directly from the ENA: Fast download of FASTQ files from the European Nucleotide Archive (ENA)
Thanks for the reply. I have a set of genomes for two species aligned to the Anna's genome. I want to make a phylogeny, using Anna's as the outgroup, so I am trying to obtain raw reads of Anna's Hummingbird to align to the Anna's genome.
Is raw read data available somewhere? I've been trying for a long while now and I can't seem to get anything to work. This is the closest I've found, but as you said, it's a bit of a mess: https://www.ncbi.nlm.nih.gov/sra/SRX1131887[accn]
Here's the assembly I found, thanks to your suggestion: https://www.ncbi.nlm.nih.gov/assembly/GCA_003957555.2 ...should I use scaffold or chromosome?
Thanks again for your help.