Entering edit mode
9.8 years ago
M K
▴
660
I am looking for R code that match the positions for repetitive elements in rmsk file for mouse mm10 with mouse genome. I downloaded the repeat masker file for mouse mm10 form UCSC website.
Do you have to do this with R? I ask because I think it would be easier to download the GFF or a BED file that has the positions, and from there it would be easy to extract the regions.
I prefer to do that with R . Also how to use the GFF file to extract these regions.
What exactly do you mean by "match positions"? The rmsk file is just a text file, so you can read it into R easily enough after fixing it with awk (just to standardize the lines, since the rmsk file is otherwise poorly formatted for machine processing). GenomicRanges will likely make whatever else you need convenient enough.
I already read the rmsk file in R. What I want to do is knowing the positions of highly repClass like L1 and Alu also highly repFamily like LINE and SINE in the mouse genome using the coordinates in the rmsk file and match them with positions in genome. Also what do you mean by using awk to standardize lines on rmsk file.
If you already read the file in then you can ignore the mentions of awk. L1 repeats are labeled "LINE/L1" and Alu are "SINE/Alu", so just subset the dataframe accordingly.
So how can I use GenomicRanges function in R to do that.
If that's all you want to do then you don't need GenomicRanges at all.
I mean how can use it to match those elements positions with the gene positions in the mouse genome (mm10) because I want to do some statistical analysis using the locations of them.
See
help(findOverlaps)
after loading GenomicRanges.Thanks a lot Devon.