Miseq Pair-End Library De-Novo Assembly - Broken Pairs?
1
1
Entering edit mode
12.6 years ago
konstantinkul ▴ 110

Hi all! I have 2 fastq files for each read from Miseq (in forward and reverse direction). After filtering there are different number of reads in each file in different order. I need classify them into normal pairs in the right order. For each group of reads I need own file "PE1.fastq" "PE2.fastq" and for group of single end reads (in another "Single.fastq"). Then I can shuffle pairs from "PE1.fastq" "PE2.fastq" with script from Velvet package. In previous post there was script from Benm (http://www.biostars.org/post/show/8724/illumina-pair-end-library-de-novo-assembly-broken-pairs/). Unfortunately Miseq data has another format of reads and I can't to use this script. Could you help me to modify that script to sort Miseq reads. Thanks in advance!

Miseq format of pair-end reads

@M00273:2:000000000-A0B69:1:1:14290:1420 1:N:0:2
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
+
+1++++++++++2?A).)..))))3.)).2.2)).(((((.(0(0;:).)('--6)))))))(..(())(((.',,',',((&(((((&&+(+((&)

@M00273:2:000000000-A0B69:1:1:14290:1420 2:N:0:2
YYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYY
+
+1++++++++++)))))))))))))))))))))))))))))))'--6)))))))(..(())(((.',,',',((&(((((&&+(+((&)
miseq • 4.1k views
ADD COMMENT
0
Entering edit mode

Please state the question that needs to be solved rather than sending people to other pages. What is the problem that you need to solve with the reads that you list above?

ADD REPLY
1
Entering edit mode
12.6 years ago
Lee Katz ★ 3.2k

I think that you've hurt yourself by filtering and then resorting them. It complicated things too much. You need a script to trim/clean and then shuffle the reads. Or, you can do it in reverse order: shuffle, then trim/clean.

If you shuffle your original read set first, then you can use a trimming and cleaning script that I made. It assumes that read #1 is paired with #2; read #3 is paired with #4; etc. http://cg-pipeline.svn.sourceforge.net/viewvc/cg-pipeline/cg_pipeline/branches/lkatz/scripts/run_assembly_trimClean.pl?revision=265&view=markup

ADD COMMENT
0
Entering edit mode

There is some problem....

In many cases I have wrong pairs (((( They differ in the the penultimate numbers

@M00279:2:000000000-A0B68:1:8:8771:21046 1:N:0:1

@M00279:2:000000000-A0B68:1:8:8160:21046 2:N:0:1

@M00279:2:000000000-A0B68:1:8:18629:21047 1:N:0:1

@M00279:2:000000000-A0B68:1:8:22486:21047 2:N:0:1

ADD REPLY
0
Entering edit mode

I think it would help me if you showed me the first 4 IDs of each the output file and the input file. I don't believe that there is a bug in the script, but I am willing to be shown otherwise!

grep -m 4 '^@' in.fastq out.fastq
ADD REPLY
0
Entering edit mode

Lee, I made ​​a hasty conclusions. You are right. It's my bug. It's weird, but in the source files from the device was broken paired-end reads that I think has led to errors when analyzing them by your script. After demultiplexing the source files manually, I got a normal pair, then I used your sript and everything turned out well. Sorry again. Lee is it possible to use your script to delete all read contain at least one nucleotide with the low quality? Can you tell what settings should I use?

ADD REPLY
0
Entering edit mode

Ok, good news.

It filters by average quality and not by single bad quality nucleotides unfortunately. You can remove low quality reads by either --min_quality (this option is for trimming from 5' or 3' end) or by using --min_avg_quality. Both of these options require a phred score integer.

To see the full options, run the command with no options. Also please cite our paper if you use this in a publication!

ADD REPLY
0
Entering edit mode
  run_assembly_trimClean.pl: PipelineRunner::main: Error: need an infile
  trim and clean a set of raw reads
  Usage: run_assembly_trimClean.pl -i reads.fastq -o reads.filteredCleaned.fastq [-p 2]
    -i input file in fastq format
    -o output file in fastq format

  Additional options

  -p 1 or 2 (p for poly)
    1 for SE, 2 for paired end (PE)
  -q for somewhat quiet mode (use 1>/dev/null for totally quiet)
  --notrim to skip trimming of the reads. Useful for assemblers that require equal read lengths.

  Use phred scores (e.g. 20 or 30) or length in base pairs if it says P or L, respectively
  --min_quality P             # trimming
    default: 35
  --bases_to_trim L           # trimming
    default: 20
  --min_avg_quality P         # cleaning
    default: 30
  --min_length L              # cleaning
    default: 62
ADD REPLY
0
Entering edit mode

Thanks. Of course I'll cite your paper!

ADD REPLY

Login before adding your answer.

Traffic: 1791 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6