Hello,
I've got a bunch of deep sequencing info to sort through. 5 sets of data with half a million reads each. Some of the reads are repeated and so I have grouped the unique sequences together and each unique sequence now has an associate value in a separate dataset describing how many times it shows up in the library. for example:
Starting with
ATCGATCG
ATCGATCG
ATCGATCG
CCCTGAG
CCCTGAG
TTTGGGG
I now have a cell in matlab that reads:
ATCGATCG
CCCTGAG
TTTGGGG
And an associated data set which reads:
3
2
1
I have 5 such groups of cells and data sets. What I need to do now is to track the changes in the strings between those 5 groups. This is an in vitro selection experiment so I should see certain sequences increase in abundance as the rounds continue. Basically, I need to take the top strings from the 5th group and track their associated values in the previous 4 groups. For example, the first place string in group 5 will probably be in 7th place in 4, in 14th place in 3, so on and so fourth.
Any advice on how to go about doing such a thing would be very appreciated. If it's not already obvious, I'm just getting to know matlab now and I'm finding this sort of operation to be difficult to figure out on my own.
Thanks!
If you're just starting to get familiar with MATLAB, I think the best suggestion would be to use something else :-) It's an only-half tongue-in-cheek suggestion, but I think you'd be doing yourself (and the rest of the scientific community who might want to re-use your work) a favor by working in another language that is free to use and already has large support for bioinformatics work. R+Bioconductor or Python + BioPython + a host of other libs would be good choices.
This suggestion is specific to bioinformatics work. MATLAB has traditionally had advantages in other fields like machine learning or computer vision as many academics have used it for quite some time in these fields, but the tide is turning there as well.
This is not intended to start a language war, just offering some sincere advice to a (presumably) early stage student of bioinformatics.