Question

Parsing a FASTQ File

0

Entering edit mode

6.8 years ago

mrsmith ▴ 50

I am relatively new to the field, and I could desperately use some help.

I am trying to process a FASTQ File using DADA2, but I really would like to separate all of the forward and reverse reads for each sample out of a very large FASTQ file. The file was initially large FASTA file, and I have already trimmed the file to remove the primers and barcodes using qiime1 , and I still have the mapping file. I then converted the file using qiime1 from a fasta to a fastq, but I'm really at a loss as to what I should do next.

dada2 • 2.3k views

ADD COMMENT • link updated 6.8 years ago by Dattatray Mongad ▴ 380 • written 6.8 years ago by mrsmith ▴ 50

1

Entering edit mode

I do not understand either. How can a file originally have been a fasta file, and then a fastq file? Where do the quality encodings come from? But if you simply have a fastq files (paired-end) with both reads in the same file (you call that interleaved), aiming to deinterleave into two separate files, here are some inspirations.

ADD REPLY • link 6.8 years ago by ATpoint 88k

0

Entering edit mode

I am sorry but the question is not clear to me. What do you want to achieve?

And are you talking about demultiplexing?

ADD REPLY • link 6.8 years ago by Nitin Narwade ★ 1.6k

0

Entering edit mode

Qiime1 has a script, split_sequence_file_on_sample_ids.py, which will separate fastq or fasta files demultiplexed using split_libraries.py, into separate files for each sample. But this will not separate forward reads from reverse reads, if your forward and reverse reads are all in one file.

ADD REPLY • link 6.8 years ago by mastal511 ★ 2.1k

score 1 · Answer 1 · 2018-07-10

1

Entering edit mode

6.8 years ago

swbarnes2 14k

Converting a fastq to a fasta results in a total loss of the quality scores. You are going to need the original quality scores to call variants.

So stop playing around with fastas, and get the original fastqs. The originals will also have read1 and read2 separate.

ADD COMMENT • link 6.8 years ago by swbarnes2 14k

score 0 · Answer 2 · 2018-07-11

0

Entering edit mode

6.8 years ago

Dattatray Mongad ▴ 380

Some points to be cleared first:

If you have single FASTQ files then your data is not paired-end.
You are talking about seperating reads. Is it mean demultiplexing? i.e seperating reads of each sample. And DADA2 assume that you have demultiplexed FASTQ files.
DADA2 need raw FASTQ files to detect variants.