Entering edit mode
21 months ago
smrutimayipanda
▴
20
I have a dataset which consists of 6-mers only. I want that dataset to have labels assigned and then divide my dataset into test, dev and train data for fine tuning process in DNABERT pipeline. Can anyone please tell me how to do it? Logic also works for me. Thanks!
kmers file looks like these: TTTTCT TGTTTT ATTGCC ACTAGT CTCTAG TCAGTG TGTTAA TCTTAT AACCAG AACTCA ATCATA CACTAA TTCTTT CACACG TGGTGT TTATTA CCCTGA CAAAGT TTTCAG ATCCTC AGTTTT ACATTC AACTCA GGACTT GTTCTT ACCTTT CTTTTC CAATGT TACTTG