Mate Pairs Contaminated With Single And Paired End?
3
1
Entering edit mode
10.8 years ago
Adrian Pelin ★ 2.6k

Hello,

I got my mate pair run done, and I asked for a jump size of 3kb.

When I map my mate pairs against assembled contigs, I see that there are indeed mate pairs with sizes close to 3kb, but also a lot of reads that do not have any partner in the middle of an 100kb contig.

Is this common, that there are paired-end reads, with a size of 300bp, and a lot of single ended reads in a mate pair library? Is there any way to deal with this? It is confusing my assemblers and I am not getting bigger contigs by adding my mate pairs in.

Adrian

tag1 tag2 • 4.6k views
ADD COMMENT
3
Entering edit mode
10.8 years ago

Yes, this is a common finding. How you deal with mate-pair preps is going to be driven a bit by your goals and your toolset, but having a mix of reads is expected.

ADD COMMENT
0
Entering edit mode

Well my goals are merely a better assembly. Any tutorials or info on how to deal with this?

ADD REPLY
2
Entering edit mode
10.8 years ago
cts ★ 1.7k

I'm going to assume that you're using Illumina matepairs, in which case there are known issues with the library preparation that results in reads that have both long (3kbp) or short (300bp) inserts. The data can be processed to separate out the types with tools like NextClip. This will produce reads that are definitely long insert and ones that may be long insert. For the ones that may be long insert you can then map them onto your current contigs using something like bwa to get a sam file. You can then parse the sam file to remove any pairs that have a short insert (look at column 9 in the sam file for the insert length). What you should have leftover are the true long insert reads which you can then feed into a de novo assembler or to a scaffolder.

ADD COMMENT
0
Entering edit mode

This is great a tool, I read the paper and tried the software. Unfortunately, all my reads got dumped into category D, which means no adapter found.

I did a little of digging, the sequencing was done on HiSeq 2000. Flow cell version was HiSeq Flow Cell v3, and the kit was TruSeq SBS Kit v3. There library type is mate pairs, but there is no mention of nextera, so maybe I need a different adapter sequence? Also, would the pipeline work if I just changed the adapter?

ADD REPLY
0
Entering edit mode

I would assume that it would work if you changed the adaptor but I don't think there is a command-line option to do that. You could hack into the source code and change the sequence manually. Alternatively I've written a similar program called prepmate that allows you to specify the adaptor sequence on the command line - the results are a bit different though and I've not tested it extensively, but its an option

ADD REPLY
0
Entering edit mode

Well you can indeed change the adapter:

  • the adaptor_sequence option allows you to specify the junction adaptor sequence to search for - by default this is the Nextera sequence CTGTCTCTTATACACATCT

The problem is that I have no idea what to change it into. I see this adapter on some of my reads "ACAAGCCCCTGCAAGGCCC". I have no idea what it is.

ADD REPLY
0
Entering edit mode

In the nextera kit they refer to it as the junction adaptor. Do they use the same terminology in the Illumina documentation for the truseq kit? Or ask the sequencing centre for the info if you did the sequencing externally, hopefully they'll know

ADD REPLY
1
Entering edit mode
10.1 years ago

The software NxTrim was recently released here

ADD COMMENT

Login before adding your answer.

Traffic: 2590 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6