Aligner for 50bp paired end short reads
1
0
Entering edit mode
6.7 years ago
bipin ▴ 30

I am aligning 50 bp paired-end data for ATAC seq.

For aligning it to the reference genome I have tried bowtie with a larger insert size i.e. -X 1000 but the results are not optimal.

Is there any other aligner which is expected to perform better for this input dataset?

ChIP-Seq Assembly alignment • 2.8k views
ADD COMMENT
1
Entering edit mode

Any short-read aligner will do, the insert size range is defined by your data! You have to estimate it for example by mapping a subset of 10,000 or 100,000 read pairs on your reference and plotting the TLEN field of the output bam file. That will tell you your estimated insert size distribution, which then you use in bowtie with -I and -X.

With 50 nt reads, you have to carefully set the --score-min´,--mp, ´--rdg and --rfg paremeters because the read is very short and you might lose many of them because of too many mismatches.

Also, are you accepting 0 or 1 mismatch in the seed?

Are the reads from the same species as the reference?

ADD REPLY
0
Entering edit mode

Thanks for your reply.

Apologies I forgot to specify that I am using bowtie and not bowtie2 since I find it is recommended if the reads < 50 bp.

The options I am giving to bowtie are -k 2 -m 2 --best --strata.

The maximum insert size in my case is 600 so I was giving a slightly higher number for -X i.e. 1000.

The reads are from the same species as reference i.e. mouse genome.

ADD REPLY
2
Entering edit mode

That is now how you should be doing it. Even if the people who made the library told you that the library size was 600 bp, it may not be the actual case. In reality, fragments tend to be smaller than you estimate them to be.

You can get actual insert sizes by using this method: C: Target fragment size versus final insert size

Edit: While there you could try bbmap.sh the mapper from BBMap suite for your mapping needs.

ADD REPLY
1
Entering edit mode

So you should try to give a -I and -X interval that ranges around 600. Try to change the scoring function, the gap and the mismatch penalty to allow more gaps / mismatches, if that is not satisfying.

You'll find how to do it on the manuals of bowtie / bowtie2 / hisat2 / tophat2 (works the same way).

ADD REPLY
2
Entering edit mode
6.7 years ago

I would use bwa aln, not bwa mem and certainly not bowtie1 for this. Bowtie1 will not align reads with indels (ie. cannot align split reads), which is going to cause major problems.

Other good short read aligners in my view include subread but I am not certain if it is good for very short reads.

ADD COMMENT

Login before adding your answer.

Traffic: 1983 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6