Hi all!
Lately I was processing some public 16S data and I came across Human Microbiom Project. I've decided to train on this data. For this purpose I downloaded SRR files (16S raw sequences) for elbow body site, you can see this here: https://portal.hmpdacc.org/search/c?filters=%7B%22op%22:%22and%22,%22content%22:%5B%7B%22op%22:%22in%22,%22content%22:%7B%22field%22:%22cases.sample_body_site%22,%22value%22:%5B%22elbow%22%5D%7D%7D,%7B%22op%22:%22in%22,%22content%22:%7B%22field%22:%22files.file_format%22,%22value%22:%5B%22Standard%20Flowgram%20File%22%5D%7D%7D,%7B%22op%22:%22in%22,%22content%22:%7B%22field%22:%22files.file_type%22,%22value%22:%5B%2216s_raw_seq_set%22%5D%7D%7D%5D%7D&pagination=%7B%22cases%22:%7B%22from%22:101,%22size%22:100,%22sort%22:%22case_id.raw:asc%22%7D%7D&facetTab=files
There are 125 samples in 4 files - total ~9GB of data. Unfortunately, I am not very familiar with 454 data (sff files) so I encountered some problems during analysis.
I saw that all 4 SFF files include also leg, knee, scalp etc. (beside elbow) body sites. Since I am interested only in elbow data I wanted to divide it by sample ID , which is written as e2559e04fcd73935a7d7b917907a1f46, e2559e04fcd73935a7d7b917907a5ced etc.
I transferred sff files to fasta and qual files with the use of qiime process_sff.py. After this step, I didn't find sample ID in the fasta headers - and now I am confused...how can I divide this data into sample ID? Or body site?
Any help will be much appreciated.
Best, Agata
I think you need mapping file. Check more from where you downloaded data.