Parsing a FASTQ File
2
0
Entering edit mode
6.5 years ago
mrsmith ▴ 50

I am relatively new to the field, and I could desperately use some help.

I am trying to process a FASTQ File using DADA2, but I really would like to separate all of the forward and reverse reads for each sample out of a very large FASTQ file. The file was initially large FASTA file, and I have already trimmed the file to remove the primers and barcodes using qiime1 , and I still have the mapping file. I then converted the file using qiime1 from a fasta to a fastq, but I'm really at a loss as to what I should do next.

dada2 • 2.0k views
ADD COMMENT
1
Entering edit mode

I do not understand either. How can a file originally have been a fasta file, and then a fastq file? Where do the quality encodings come from? But if you simply have a fastq files (paired-end) with both reads in the same file (you call that interleaved), aiming to deinterleave into two separate files, here are some inspirations.

ADD REPLY
0
Entering edit mode

I am sorry but the question is not clear to me. What do you want to achieve?

And are you talking about demultiplexing?

ADD REPLY
0
Entering edit mode

Qiime1 has a script, split_sequence_file_on_sample_ids.py, which will separate fastq or fasta files demultiplexed using split_libraries.py, into separate files for each sample. But this will not separate forward reads from reverse reads, if your forward and reverse reads are all in one file.

ADD REPLY
1
Entering edit mode
6.5 years ago

Converting a fastq to a fasta results in a total loss of the quality scores. You are going to need the original quality scores to call variants.

So stop playing around with fastas, and get the original fastqs. The originals will also have read1 and read2 separate.

ADD COMMENT
0
Entering edit mode
6.5 years ago

Some points to be cleared first:

  1. If you have single FASTQ files then your data is not paired-end.
  2. You are talking about seperating reads. Is it mean demultiplexing? i.e seperating reads of each sample. And DADA2 assume that you have demultiplexed FASTQ files.
  3. DADA2 need raw FASTQ files to detect variants.

For more information, please refer DADA2 tutorial

ADD COMMENT
1
Entering edit mode

If you have single FASTQ files then your data is not paired-end.

Interleaved FQ files do indeed exist. See my comment above.

ADD REPLY

Login before adding your answer.

Traffic: 1749 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6