Hi, I am trying to get 1-21nt, 2-22nt, 3-23nt etc till the end of sequences. For example, if I have sequences of 51 nt in length then I will get the 30 fragments with 21 nt in each fragment. I tried this script whcih looks to work fine but only for first 2 lines only.
fasta <- read.csv("trial.csv", header = TRUE, sep =",")
dd1 <- as.character(fasta$Seq)
IDS <- as.character(fasta$Ids)
b <- numeric()
final <- NULL
for (i in 1:length(dd1))
{
for ( j in 1:nchar(dd1[i]))
{
a <-dd1[i]
b[j] <- substr(a, j, j+20)
#print (nchar(b[j]))
#print (nchar(dd1[i]))
#print (length(dd1[i]))
ifelse ((nchar(b[j]))==21 && (j<=(nchar(dd1[i])-21)),{
print(b[j])
next}, break())
}
final <- paste(IDS[i],b, sep = ",")
write.table(final, paste(i,"final.csv"))
}
Example : trial.csv will look like this
Ids,Seq
hsa_circ_0000013,GGCTCCAGGGAGCTTGGCTTCTGTAGAAGTTCTAAGGAAGCGGTACGAACTCCACGGCGGTGGGGCGCTAACTAGCAGGGACCCCTGCAAGTGTTGGTCGGGGGCCTCGAGCTGCCTGAGCTGACACGAGGGGAGGGGTCTGTGTAGCCAACAG hsa_circ_0000026,CAATCCCACAGAGTATTGATGAGGAAACTGAAGTTTGGAGCGATCACATCATTTTCCCAAGAATTCCTAGAGGACCTGTGCAACAACCTCTTGAGGATCGAATCTTCACTCCCGCTGTCTCAGCAGTCTACAGCACGGTAACACAAGTGGCAAGACAGCCGGGAACCCCTACCCCATCCCCTTATTCAGCACATGAAATAAACAAGGGGCATCCAAATCTTGCGGCAACGCCCCCGGGACATGCATCGTCCCCTGGACTCTCTCAA
hsa_circ_0000037,ATACATTTGGGCCTGTCTACCTGCCTTTGGGGCAATTTGCAGGTTTTGAGAAGTAGAAATGAGGGTCTGGAGAGGGCATCTGTGAGCCTCTTCTGGGAACCCCTCCCTTGTAGGT
hsa_circ_0000050,GATGAATTCAAAAGACTATTTGCTAAATATGGAGAACCAGGAGAAGTTTTTATCAACAAAGGCAAAGGATTCGGATTTATTAAGCTTGTGAGTGTAATTGTTTGATTTTACGTAGAATTAAAAAGGGTGGGGGATTTTTTTGTCACTACAAACGCTGAAGGCTTGGTTTTTAAACTGGGGAGGATAAATTGATCTTTTAGATTTTTCACCATTCTTACAGGAAAAATGCTTGCGGTATAATGCATAATTGTTGCTACCTAAGAGAAAAGAGGGGTGGGGTGGGTAAACTAAGTGGTGTTAGTGGGTGCTGCCTAAAGGTAATGGTCGAACTGAGCTGAAGGAAGAAAGGGA
Any help is much appreiciated.
Thanks
If you are doing an assignment then by all means continue with your code.
If you just need to print all 21-mers from a file use
kmercountexact.sh
from BBMap suite like this.Thanks. I am going to try this as well.