Partially map reads to a reference genome
0
0
Entering edit mode
5.7 years ago
kspata ▴ 90

Hi,

I have genomic DNA samples sequenced on MiSeq PE 150. I have trimmed these raw reads with Trim_galore and the length distribution of the trimmed reads is now from 50bp - 150bp. Is there a way that I can partially map the reads to a reference sequence. For example, break a read into fragments of 50bp and then map these fragments to the reference. After this, I want to bin the 50bp reads which are mapped along with their mate pairs. Is there any tool available for this?

I tried using BBMap but it does not work.

bbmap.sh in= trimmed.R1.fq.gz in2=trimmed.R2.fq.gz maxlen=50 out=output.sam

Where,

fastareadlen=500 Break up FASTA reads longer than this. Max is 500 for BBMap and 6000 for BBMapPacBio. Only works for FASTA input (use 'maxlen' for FASTQ input). The default for bbmap.sh is 500, and for mapPacBio.sh is 6000.

It gives me the following error

Read of length 149 outside of range 0-50. Paired input is incompatible with 'breaklength'

It should break reads which are longer than 50bp. Am I interpreting this incorrectly?

Thanks!!

alignment sequencing binning • 1.3k views
ADD COMMENT
0
Entering edit mode

can partially map the reads to a reference sequence.

what are you trying to do at the end ? bwa mem can produce supplementary alignments.

ADD REPLY
0
Entering edit mode

I am looking to obtain sequence of a gene into which vector DNA is inserted. I do not have the genome reference and only have the plasmid reference.

ADD REPLY
0
Entering edit mode

Then look for reads that align only partially to your plasmid reference. Part that does not align will be soft clipped (and presumably will be what you would be interested in?).

ADD REPLY
0
Entering edit mode

How can I obtain these reads which are partially aligned. Will this information be available in the SAM file? I am using bowtie2 as aligner.

ADD REPLY
0
Entering edit mode

Will this information be available in the SAM

'H' and 'S' in the Cigar String, SA:Z:* attribute, sam flag for supplementary...

ADD REPLY
0
Entering edit mode

Thank you!!! I will try this out.

ADD REPLY
0
Entering edit mode

You are not interpreting this correctly. That option is meant to be used for very long reads (e.g. PacBio/ONT) so they can be broken into smaller pieces for alignment.

the length distribution of the trimmed reads is now from 50bp - 150bp. Is there a way that I can partially map the reads to a reference sequence. For example, break a read into fragments of 50bp and then map these fragments to the reference.

I don't understand why you want to do that. Aligners (certainly bbmap.sh can) should be able to align data of variable length to a reference.

Note: You could hard-trim all your reads to 50 bp, if that is what you want. You would then be throwing good data away.

ADD REPLY

Login before adding your answer.

Traffic: 1996 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6