How to remove similar and repeat sequence from a miRNA data set ?
1
0
Entering edit mode
10.0 years ago
dinesh ▴ 50

hi every one, can any one tell me How to remove similar and repeat sequence from a miRNA data set ?

rna-seq alignment • 2.0k views
ADD COMMENT
1
Entering edit mode
10.0 years ago

Similar sequences can be clustered together (e.g., with cd-hit). In fact, depending on your source and goals this can be quite useful (I've never found this useful for miRNAs, but some of the piRNA databases need to be processed in this way before they're useful). Of course, if you're talking about similar sequences at the read level, then many pipelines (e.g., mIRbase) will collapse them together for you. There's no need to reinvent the wheel there.

For repeat sequences, it depends exactly on what you're talking about. Most people would align against either a mature or hairpin miRNA fasta file, which would obviously not contain known repeat regions. If you have repeats being either expressed or showing up due to DNA contamination, then the simplest route would be to simply filter them out by MAPQ or alignment score/edit distance, depending on the particular aligner you're using (and assuming that your reference sequence matches that of your samples well enough).

Of course if you're interested in finding novel miRNAs and, therefore, need to align against the genome then life becomes more "interesting". If you're concerned that your sequencing data might have repeat/DNA contamination, then you have two choice. (1) Align against a soft-masked genome and "simply" filter out reads aligning to a majority of soft-masked bases. "Simply" is in quotes because I expect you'll have to write a program to do this. What I expect would be the easier route would be to align against the genome and (2a) filter by MAPQ to get rid of probable repeat regions or (2b) ignore reads that align to know regions with bedtools and the output GTF file from repeatmasker. There are already written packages with all of this in mind, so give one of them a try before trying to roll your own solution.

ADD COMMENT
0
Entering edit mode

Thank you very much sir

ADD REPLY

Login before adding your answer.

Traffic: 1994 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6