What stage of pre-processing in EBI data?

0

Entering edit mode

6.6 years ago

CAnna ▴ 20

Hi,

I am trying to figure out the necessary pre-processing steps before using sequencing data retreived from online databases. I work with metagenomes from Huaman gut microbiome. I figured out that the main three steps for this type of data are:

1) Identify and mask Human reads

2) Remove duplicate reads

3) Trim low quality bases

Here is an example of a study from which I would like to use data.

I can't figure out at what stage those data are. I beleive Human reads masking should have been performed already, as this has to deal with subjects privacy/ethics. But I don't find a clear information telling me that this is the case or not. Are sequencing data available on inline repositories always already cleared of Human reads already?

Thank you, Camille

pre-processing EBI • 1.1k views

ADD COMMENT • link updated 16 months ago by Ram 44k • written 6.6 years ago by CAnna ▴ 20

1

Entering edit mode

Assume that the provided data is raw, if there are no notes about it being processed. If all reads are the same length then it is not even scanned/trimmed.

I suggest that you use removehuman decontamination protocol using BBMap suite. Other tools in suite clumpify.sh (will help you remove dups) and bbduk.sh will help trim the data.