Is there a program that can be used to remove sequences with Poly A tail?
Is there a program that can be used to remove sequences with Poly A tail?
Emboss Trimest: http://emboss.sourceforge.net/apps/cvs/emboss/apps/trimest.html
trimest reads one or more nucleotide sequences and writes them out again but with any 3' poly-A tail (or, optionally, 5' poly-T tail) removed. It detect any poly-A and poly-T tails in the input sequences that are at least the specified minimum length. The tails may continue a defined num of non-A or non-T bases. If both a 5' poly-T tail and a 3' poly-A tail is identified, it removes the longest of the two. The output is a set of sequences with the poly-A (or poly-T) tails removed. If a sequence had a 5' poly-T tail then the resulting sequence is reverse-complemented by default. The description line has a comment appended about the changes made to the sequence.
I am not sure about your question. Do you hope to remove the simple repeats in your dataset? You may try fastxartifactsfilter. It is a command to remove simple artifact sequences in the raw fastaq result of deep sequences.
My limited understanding of the biology suggests that mature mirna will not have polyA tails.
However what I often see in dealing with mature mirna, is an artificial polya . In these cases the polyA will be downstream of the small RNA adapter.
(mature mirna)ATCTCGTATGCCGTCTTCTGCTTG(AAAAAAA)
I originally though this fake polyA was ligated by the illumina small rna kit to piggyback on their mRNA extraction kit, but it is actually Bustard's way of saying "no call" Illumina RNASeq: adaptor sequence followed by polyA?
So while trimming the adapter should also remove the polya, the presence of polya does not imply mRNA contamination. I would focus on using the adapter presence for filtering.
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
FYI just tried fastx_artifacts_filter, and it did not remove poly-A tails from my reads.