Extract Alu Repeats
2
2
Entering edit mode
12.0 years ago
GPR ▴ 390

Hello. I have snp data (vcf files) and would like to extract those events that happen on Alu repeats from those that don't. Reading the threads here in BIOSTAR, I gather that potential options to do this are the tools RepeatMasker, TANTAN and TRF. I also read that each of these has its drawbacks. Can somebody advice on the tool that works best for this purpose? Thanks, G

repeats • 6.8k views
ADD COMMENT
10
Entering edit mode
12.0 years ago
Ryan Dale 5.0k

You can download RepeatMasker tracks in BED format from UCSC's Table Browser. Choose your genome of interest, and get the data with the settings:

  • group: Variation and Repeats
  • track: RepeatMasker
  • table: rmsk
  • output format: BED

Assuming you save the results as output.bed, you can then grep out the Alu regions of interest and intersect with BEDTools:

grep "Alu" output.bed > alu.bed
bedtools intersect -a mydata.vcf -b alu.bed > snps-in-alu.vcf
bedtools intersect -a mydata.vcf -b alu.bed -v > snps-not-in-alu.vcf
ADD COMMENT
0
Entering edit mode

thanks so much! I will try this.

ADD REPLY
0
Entering edit mode

Am trying to get the same results, thus, am trying to get the ALU coordinates for the human chromosome 1. I tried to follow as you suggested but 1) thhere is no possibility of choosing Variations and Repeat, just one option can be selected and 2) once i selected Repeat i get "No results for that query".... can somebody help me pls.....

ADD REPLY
1
Entering edit mode
10.8 years ago

I ve used bedtools intersect approach using VCF file, but it gives an error: Error: malformed BED entry at line 2. Start was greater than end. Exiting.

So I converted vcf file to bed format and it worked fine for me.

awk '{print $1"\t"$2"\t"$2}' 1233_variant_pos_final.vcf >1233_variant_pos_final.bed

Also better to sort all the input the bed files.

sort -k 1,1 -k2,2n hg19_Alu.bed > hg19_Alu_sorted.bed
ADD COMMENT

Login before adding your answer.

Traffic: 1672 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6