Download from ENA according to XML
1
0
Entering edit mode
3.8 years ago

Dear All!

I don't know if it is possibly, that is why i ask. We usually download a lot of sequences from ENA to build database for phylogenetic analysis, but a lot of time there is only .bam files not fastq. My question would be, that is possible to download XML for the chosen study, filter the samples that have raw sequences as fastq file, but give error to bam files.

Thank you in advance

ENA Genome XML Python • 1.2k views
ADD COMMENT
0
Entering edit mode

that sounds like it should be possible yes.

However, I think you're far better of (an much easier) by downloading the bam files and then transforming them to fastq files, eg with bamtofastq (https://bedtools.readthedocs.io/en/latest/content/tools/bamtofastq.html)

ADD REPLY
0
Entering edit mode

Thank you! I know it is possible to convert bam to fastq, but usually the bam file is filtered and don't have reads that are usefull to us (for example mitochondrial reads). Maybe you know, how should I do the XML filtering? :)

ADD REPLY
0
Entering edit mode

I see, depends a bit a on the bam then, you can (are allowed to submit your raw reads as bam as well, so those should contain all data) , but this might be hard to spot without processing them.

On the other hand, is it's raw data it should contain all, no matter if it's bam or fastq file.

Perhaps contact the ENA helpdesk for this (and perhaps get back here if you can resolve this)

ADD REPLY
1
Entering edit mode
2.0 years ago
Polina ▴ 10

I've created a Python tool: ENATool, which downloads and parses xml from ENA browser to csv format, which you may filter based on your preferences (at this case, fastq files) and then download raw data.

ADD COMMENT

Login before adding your answer.

Traffic: 2678 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6