Question

Removing shorter reads in PE mapping

0

Entering edit mode

7.6 years ago

biolab ★ 1.4k

Dear all,

I have pair-end (PE) sequencing data (R1.fq and R2.fq). First, I removed the adapters, which resulted in unequal sequence length for some PE reads. Then I removed the reads with length < 18-nt, which resulted in different number of reads for the R1.fq and R2.fq data. Do you think it OK to continue tophat2 mapping? (I am trying this, and it seems OK)

I appreciate any of your comments. Thanks!

mapping • 1.9k views

ADD COMMENT • link updated 7.6 years ago by h.mon 35k • written 7.6 years ago by biolab ★ 1.4k

1

Entering edit mode

For normal fragment libraries, adapter-trimming read pairs should leave them the same length. It's important to process paired reads together with a pair-aware program to prevent these issues of differential length after adapter-trimming and mismatched reads.

ADD REPLY • link 7.6 years ago by Brian Bushnell 20k

0

Entering edit mode

Thank you very much, Brian. So, do you think one way is to do separate mapping for R1 and R2? Thanks!

ADD REPLY • link 7.6 years ago by biolab ★ 1.4k

1

Entering edit mode

Yes, that is one way to do it, but the reason to run paired reads is because the mapping is more accurate, so you'll end up with inferior results. Also, you'll end up with untrimmed adapters by not doing adapter-trimming together. I recommend deleting your trimmed files and following this approach, starting from the raw fastqs. Specifically:

bbduk.sh -Xmx1g in1=read1.fq in2=read2.fq out1=clean1.fq out2=clean2.fq ref=adapters.fa ktrim=r k=23 mink=11 hdist=1 tpe tbo minlen=18

...where adapters.fa is in bbmap/ref/

ADD REPLY • link 7.6 years ago by Brian Bushnell 20k

0

Entering edit mode

Thanks a lot Brian. It's very helpful. I will try that.

ADD REPLY • link 7.6 years ago by biolab ★ 1.4k

score 1 · Answer 1 · 2017-05-07

1

Entering edit mode

7.6 years ago

h.mon 35k

1) please do tell us how (the commands) you used to trim adapters and remove short reads.

2) No, if R1 and R2 have different number of reads, probably the pairing information is lost and your mapping will be wrong, as tophat will assume a non-existing pairing between R1 and R2.

ADD COMMENT • link 7.6 years ago by h.mon 35k

0

Entering edit mode

Thank you very much for your answer. The command I used is: cutadapt -a SEQ --trim-n -o out.fq in.fq, I used a perl script to remove reads < 18-nt. Can I ask a further question? Although the pairing information is lost as you mentioned, are the mapping positions of reads correct? To me, the mapped positions of reads are important in following analysis. Thanks!

ADD REPLY • link 7.6 years ago by biolab ★ 1.4k

0

Entering edit mode

are the mapping positions of reads correct

Most probably not. If you are taking out any reads from one file, its mate should be taken out of the corresponding (R1/R2) file to keep the order of reads consistent in files. Aligners do not check to ensure that the read pairs are in order.