Hi guys,
I am working with 454 reads with 1x coverage. In order to find centromeric repetition, I was thinking about assembling reads and then trying to recognize repetitive regions. However, I am not sure how much sense it makes to assembly reads with such coverage.
Any help will be appreciated. Thanks.
EDIT: Thank you very much for the answers. I decided to reformulate problem. I have 454 reads with 1x coverage from Cardamine rivularis and my aim is to find centromeric repetition. Therefore there are two basic approches - looking for repetition in raw reads and looking for repetition in assembly (use of this is ambiguous with this coverage).
I have also had idea to use centromeric repeats from other species as reference sequences and try to map reads on them. In case that repeat from Cardamine would be similiar (what is not probable:), this could work.
I will postpone checking answer untill I will try:) Comments and discussion still more than welcome and thanks a lot for answers and comments discussed so far.
At 1X coverage, working with raw reads is probably better than assembly. At least for human, centromeres are mostly imperfect satellite repeats. I do not think at 1X you can get more from assembly.
De novo assembly is not the right method for detecting highly repetitive regions. Try clustering of sequence reads instead, e.g. cd-hit, or cd-hit 454 see the link below.
http://biostar.stackexchange.com/questions/1968/how-to-cluster-454-reads/1969#1969
Here is a relevant paper where k-mer frequency spectra from 454 reads was used to characterize centromeric regions in rice. http://bioinformatics.oxfordjournals.org/content/26/17/2101.full
Which species do have? Did you consider getting the centromeric repetitions directly from the reads?
It is from Cardamine rivularis. The thing is that centromeric repeats use to be quite long (180bp in Arabidopsis) and therefore hard to find in 454 data.
Have you tried running RepeatMasker on your reads?
The length of 454 reads has reached 300bp for several years. Your (alpha?) satellite unit should be contained in one read.