Entering edit mode
4.4 years ago
Aishwarya Kulkarni
▴
90
I am dealing with a file called GABPA_GM12878_GABP_HudsonAlpha_AC.seq.gz which is a CLIP seq dataset, the format is as follows:
FoldID EventID seq Bound
A seq_00001_peak AACCAAGAACACAACTGAAATGGTGCGTCCCGCTGCCAAACACGTCCCCGCCCTCTCTTCCGCTTCCGGCCTGGCGCCTTCCTCCCCCTTTGCGCTCCGGT 1
A seq_00003_peak GGAACCTCCGCTTCCGGCTCCACGTCCGCCCGGAAGAAGATCTGCTGCACACTTCCGTTTCCGGTCCGTGCCCTTGGGGCTCCGTGTCCTGCTGTCTTTCC 1
A seq_00005_peak GAGGGCGACCGGAAGTGCTCACGTCTTCACCTTCCCCGCCACGCCACCGTCCTTTCAGGCCCAGCGTGCAGCAGGAAGGAGGACTCTTTTGCCGCGGACTC 1
A seq_00007_peak TTACGGCGCCGGAAGCAGCGGTCCTCCCCCGTCCTTCACTTCCGGCCCCCGGTCCGTCACCGACGCCGTTCCCAGCGCCAGCGCGGTGTGGTGGATCGTGA 1
A seq_00009_peak CCGCGCCGGAAGCCGCTGTCTTTCCCGTCCCTCGCCGGAAGTGGTCCTCTTCTTACCCATCCCTCTCAGGAAGTGGGCACAAACTCTCGCCCGACACCACG 1
A seq_00011_peak TTTCCCCCTCCGCTCCGCCCACTTCCGCCCAGCAGAAGGCACCGGAAGTAGAGCTGCCCCTAGTTGCGGAAGTGCCTCGCCTGGCTTGCTCAAGTCCTCTC 1
A seq_00013_peak CAGTGGGGCCGGAAGTTGTGTTCACTCGGGTCCACTCCGCAGAGTCCCGACGGAAGCGAAAGAAACTCGAGCGACGGGGGTTGAGTTCCGGGGGAGGTTGA 1
A seq_00015_peak CTCTTCCCCGAAACCGCGGTGCTCCCACTTCCTGCAGTCCGAGGTTCCGACTCGACGCCAGCTCCGCGAGCAATGGTTCCGCCCTGCGCCCTGCCCCTTCC 1
A seq_00017_peak CCAACAGCAGGAACCTGTACGGAAGACGGGAAGGGCCCGGTACGCGCCGTTTGCAAACCCCGCAGAAACCAGCGGCGCCACCAGAAGGTTCCGTCTGTGGA 1
I was wondering if there is more information about this format and if there is a way to convert .fasta files to .seq files Thanks!
If you convert your fasta file to tab delimited text (use answers here: fasta file to tab delimited file ) then it should look similar to format above. You will need to figure out what to put in
FoldID
andBound
columns.Thank you so much, this is exactly what I needed.