How do I find insertions with BEDTools?
1
0
Entering edit mode
4.7 years ago
AXR ▴ 10

I have a reference genome and a bunch of sequenced and assembled genomes of the same species (human). The sample genomes are lifted over to allow comparison with the reference genome. Some sites of known insertions have already been masked by repeatmasker.

I would like to detect novel insertions using BEDtools.

How do I do it?

Thank you.

genome alignment sequence insertion bedtools • 2.1k views
ADD COMMENT
1
Entering edit mode

I would like to detect novel insertions using BEDtools.

Just curious - is there a reason you're looking for solutions using bedtools?

ADD REPLY
0
Entering edit mode

My advisor told me it's what I should use.

ADD REPLY
0
Entering edit mode

Please do not add answers unless you're answering the top level question. Instead, use Add Comment or Add Reply as appropriate.

"My advisor told me" is not a good enough reason to stick to a tool. Have you done your research to figure out if the tool can do what you need it to do, and how to do it as well as the limitations the tool poses?

ADD REPLY
0
Entering edit mode

No I have not. He said it's all I need. Would bedtools not be a good choice then? Or would it work better under certain conditions? If I were to choose bedtools how would I do it?(I'm not sure if he would accept something else)

ADD REPLY
1
Entering edit mode

It is unclear what data you even have especially the format. Do you have BED files, BAM files, Fastq? In general BEDtools is not among the standard tools for insertion search.

ADD REPLY
0
Entering edit mode

I have a BED file for the reference, FASTA for reference genome, BAM for other genomes. I'm not sure if there are BED files available for the other genomes, I think so but I'll have to check since I haven't received all the files yet. Assuming I had BED files for all, how would I proceed? What if I didn't have BED files?

ADD REPLY
0
Entering edit mode

In case, you have annotation (BED format) for each genomes. you can use bedtools to compare between genomes. like: intersect,

ADD REPLY
0
Entering edit mode

I am not an expert in comparing genomes but I am reasonably sure that comparing intervals is (probably?) not accurate. Shouldn't you do like global alignments or local realignments of interesting regions? Intervals can be influenced by a lot of factors such as stretches of repetitive nucleotides or Ns. If variants indeed exist should be determine by comparing sequences, not intervals.

ADD REPLY
0
Entering edit mode
4.7 years ago
liorglic ★ 1.4k

Hi,
I'm not 100% sure that I undertand what you're trying to do, but in any case bedtools would not be useful here because a bed file wouldn't contain the information you are looking for.
I am assuming that when you say "genomes are lifted over" you mean that they were mapped/aligned to the reference. If that's the case, then you should have the mapping result which could be of various formats, depending on the alignment software used (e.g. sam/bam or one of the Blast outputs). What you are looking for are alignment gaps, or in other words cases where the edges of a sample contig were mapped to the reference, but some "bulges" occur in the middle. You want the sequences of those bulges.
I had a similar problem lately and created this script. You are welcome to use it, but this will only work if you have sam/bam file as input. Also, it prints out fasta sequences - not sure if that's what you want.
Does this make sense?

ADD COMMENT
0
Entering edit mode

I think so yes.

I use a script like yours to find previously aligned sequence data to find the start and end positions of the alignments made.

Then afterwards maybe using another script/next step of last one, get the start:end positions + sequences of the areas not shown, the bulges.

And I have the extracted bulge sequences I can then closely examine them through other means like a prediction program followed by experimental validation.

Is that what you mean? Is there something I'm missing?

ADD REPLY
0
Entering edit mode

Can you please send the link again? The link is not working right now

ADD REPLY
0
Entering edit mode

Well, it's been a while and I'm not sure where the script I linked to is... However, you can look at this one instead. It expects a Minimap2 PAF file as input. Hope this helps.

ADD REPLY

Login before adding your answer.

Traffic: 2910 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6