Is the file produced using an algorithm like this?
sam = '''
chr1_22009_22554_0:0:0_0:0:0_94d2 99 chr1 22009 60 100M = 22455 546 CCTCTCAAAATCTGGGGATTGGAGGCCTAGTAGTAATGGCCTCATTTTGAAGGAGTTGGGAGAAGGAGTGGCCAGCAACCTGGAAGTGATGTTCTCTGAG IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII RG:Z:delPN01_deb_read1 XT:A:U NM:i:0 SM:i:37 AM:i:37 X0:i:1 X1:i:0 XM:i:0 XO:i:0 XG:i:0 MD:Z:100
chr1_22009_22554_0:0:0_0:0:0_94d2 147 chr1 22455 60 100M = 22009 -546 TTCTGAACGCCGTTCTTATTGCTAACGAAACCCTTGATTCTAGATTGAAAGACAACAAACCGGGTCTCCTTCTCAAGATGGACATTGAGAAAGCTTTTAA IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII RG:Z:delPN01_deb_read1 XT:A:U NM:i:0 SM:i:37 AM:i:37 X0:i:1 X1:i:0 XM:i:0 XO:i:0 XG:i:0 MD:Z:100
'''
lines = sam.split('\n')
main_dic = {}
values_dic = {50:0, 51:0, 52:0, 53:0, 54:0, 55:0, 56:0, 57:0, 58:0,
59:0, 60:0, 61:0, 62:0, 63:0, 64:0, 65:0, 66:0, 67:0,
68:0, 69:0, 70:0, 71:0, 72:0, 73:0, 74:0, 75:0, 76:0,
77:0, 78:0, 79:0, 80:0, 81:0, 82:0, 83:0, 84:0, 85:0,
86:0, 87:0, 88:0, 89:0, 90:0, 91:0, 92:0, 93:0, 94:0,
95:0, 96:0, 97:0, 98:0, 99:0, 100:0, 101:0, 102:0, 103:0,
104:0, 105:0, 106:0, 107:0, 108:0, 109:0, 110:0, 111:0,
112:0, 113:0, 114:0, 115:0, 116:0, 117:0, 118:0, 119:0,
120:0, 121:0, 122:0, 123:0, 124:0, 125:0, 126:0, 127:0,
128:0, 129:0, 130:0, 131:0, 132:0, 133:0, 134:0, 135:0,
136:0, 137:0, 138:0, 139:0, 140:0, 141:0, 142:0, 143:0,
144:0, 145:0, 146:0, 147:0, 148:0, 149:0, 150:0, 151:0,
152:0, 153:0, 154:0, 155:0, 156:0, 157:0, 158:0, 159:0,
160:0, 161:0, 162:0, 163:0, 164:0, 165:0, 166:0, 167:0,
168:0, 169:0, 170:0, 171:0, 172:0, 173:0, 174:0, 175:0,
176:0, 177:0, 178:0, 179:0, 180:0, 181:0, 182:0, 183:0,
184:0, 185:0, 186:0, 187:0, 188:0, 189:0, 190:0, 191:0,
192:0, 193:0, 194:0, 195:0, 196:0, 197:0, 198:0, 199:0,
200:0, 201:0, 202:0, 203:0, 204:0, 205:0, 206:0, 207:0,
208:0, 209:0, 210:0, 211:0, 212:0, 213:0, 214:0, 215:0,
216:0, 217:0, 218:0, 219:0}
for line in lines:
splitted = line.split()
if splitted != []:
chr_name = splitted[2]
length = len(splitted[9])
values_dic[length] = values_dic[length] + 1
main_dic[chr_name] = values_dic
print(main_dic)
This README page is indicating that the said
.tsv
file should be in theexamples
(e.g.example/test/t21.tsv
) directory.Thanks Yes, I know. I'm asking about creating such files from raw NGS fastq/bam/sam files.
Ah I see. Then you may want to post an example snippet in your original question so people know what format you need.
I added an example.
Looks like this will be the
TLEN
value from your aligned BAM files added to specific length bin for each chromosome (SAM spec, page 8).Would you please check the answer I added and tell me your opinion. Thanks.