Question

I have questions about importing fastq data from ENA to qiime2

2

Entering edit mode

4.2 years ago

seok1213neo ▴ 40

Hi I am new to using Qiime2, and I got couple of problems for importing my data to Qiime2

I understand that you first need to create an artifact file designated for the program (qza file), and there are ways to do so in regards to the format of the fastq file of interest.

my questions are:

I got my fastq files from ENA, and the format of them seems like they are not explained in the Qiime tutorial. the fastq files are consisted of 2 files, which I thought were forward and reverse files from paired-end-sequencing one of the files looked like the below

eg)

@SRR6664786.1 1/2
CTACTACAGGTTTTTCTTTTCCTCTTCTCTCCCCCTCCTTTCTCTCCTCCTCCTCTTTCTCTTCCCCCCCTTCCCCCTTCTCCTCCCCTTTTCCTCCCCTTCTCTTCTCCTTCCCCCTCTTCACCCCTTTTTCCTCCCTCCCCTCCCTTCCCCCCCCCCCCTCCTTTCCTTCCCCTTCTCTCCTTTTTTCCCCCCCCTTCCTCCCCCTTCCCTCTCTCCCCCCTCCTCTCCCTTTCTCCCCCCTCCCTCCCCCTCCCCCCTCCCCCCTTCCTTTTCCCCTCCTCTTTTTCCTTCCCTTCTC
+
#############################################################################################################################################################################################################################################################################################################
@SRR6664786.2 2/2
TGGGACTTCTGGTGTTTCTTATCCTTTTTTCTCCCCACGCTTTCGCTCCTTTGCGTCTGTTCTTTCCCCATGCCCTGCCTTCCCCTTCTTTTTTCCTCCCCATCTCTACTCTTTTCCCCTCTACACGTGGTTTTCTACCCCTCCCTATAGTCCTCTTGCGTCCCCGTTTGTTTTTCATTTCCCTGTTTTCTCCCGCGTCTTTCCCCCCTCTCTTTCTTCTCCCCCTGCCTGCCCTTTTCCCCCCTTTTCTCCCCTTCCTCCTCTCCCCCCCCCTTTTCCCCCCCTTCTTTGCCCCTCTTTT
+
#############################################################################################################################################################################################################################################################################################################
@SRR6664786.3 3/2
TCGTCTACACGCTTTTCTTTTTCTTTTTTTTTCCCCCCCTTTCTTTCTTCTCCCTCCGTTTCTTTCCTTTCACCCCCCTTCCCTCCTCCCTTTCCTTCTTCTTTCTATCTATTTCTTTCCTCCCCTCCCACTTCTTCTTCCCCCCCCCTCCCTTCCTTACGCCTCTCTCCTTTCCCTTCCCCCTCTTTTTTCCATCCCTTTTTCTCCCTCCTTCCTTCCCCCTCCACTCTCCCTTCCCTCCCCCTTCTCCCCTCTACCTCCTCCCCCCCCCTCTTTCCCTCCCCCTCCTCCATCTCGTTAT
+
#############################################################################################################################################################################################################################################################################################################
@SRR6664786.4 4/2
CGGACTACCATGGTTTCTAATCCTTTTTTTTACCCACACTTTCGATCTTCTCTGTCAGTTGCTTTCCAGTGAGCTGCCTTCTCTATCGGTTTTCTTCCTTTTATCTAAGCATTTCTCCTCTACACCACGAATTCCCCCCACCTCTACTGTCCTCAATACTGACATTATCATCTGCAATTTTACGGTTTTTCCGCAAACTTTCACACCTTACTTCCCTTTCCACCTACGCTCCCTTTAAACCCAATCACTCCGTCTAACCCTCGGATCCTCCGTATTCCCCCGGCTTCTGCCTCTGATTTCT
+
-88ACEDGGFFA9FGGAC6,CCEF,<CC++@,CCFC;7BEFFE,@,6,<6,,<,C@<,CE,,,,<9:@,C,,,:E,@BFF=,:,,996+,,BCFF??,9,,:,:9A??,?,AE?;,94,49944A,9A7++9AA?,9+4+46@?F############################################################################################################################################################

I first thought they are fastq files with barcodes in the sequence, so I managed to make an artifact (multiplexed.qza) out of them, then when I tried to demultiplex them I needed a metadata (typically in tsv format), which I needed to know the barcodes, but where can I find the barcodes in such files? could you help me what sequences are the barcodes?

If I am wrong about interpreting them as 'multiplexed sequence with barcodes in sequence', what type of fastq file should they be? if barcodes are not in the sequence, where should i find their barcodes gz files?

Looking forward to seeing your answers! Thank you

qiime fastq • 1.8k views

ADD COMMENT • link 4.2 years ago by seok1213neo ▴ 40

score 1 · Answer 1 · 2020-10-29

1

Entering edit mode

4.2 years ago

h.mon 35k

See What is the relationship between BioSamples, SRA Experiments, SRA Runs, and my data files?.

Sequencing data deposited at ENA / SRA / DDBJ is already demultiplexed, each file (or pair of paired reads files) corresponds to one sample. You have to search the BioProject and BioSample pages for the metadata describing the files pertaining to the experiment you are interested at.

ADD COMMENT • link 4.2 years ago by h.mon 35k

0

Entering edit mode

and those files are not in Casava1.8 format right?, and if each file contains two fastq files, do they mean they are forward and reverse sequences formed by paired-end-sequencing? but i got files that only have one fastq files, and other data even had three fastq files. i am so confused. please help me

ADD REPLY • link 4.2 years ago by seok1213neo ▴ 40

0

Entering edit mode

and those files are not in Casava1.8 format right?

No, these files are usually compressed fastq files, or sra files, which is a format created by the NCBI. Regarding the number of fastq files, again, you will have to read the available metadata to figure out what is happening. Possibilities include incorrect submissions, barcode files, single-end sequencing, and so on.

ADD REPLY • link 4.2 years ago by h.mon 35k