I have a large file with tab separated three data columns (and some repetitive header lines) as:
Sequence ../Output/yy\Programs\NP_416485.4 alignment. Using default output format...
# ../Output/Split_Seq/NP_416485.4.fasta - gap penalty: 1 - normalized: False
# align_column_number score column
0 0.66627 ------MMMMM
1 -1000.00000 -----S-GGGG
2 0.66627 --MMMF-FFFC
3 0.71962 MMAAAF-CYYY
4 0.43673 SSTTTN-TAAT
5 -1000.00000 HRKKKT-GRRR
6 0.61010 YFKKKL-TTTT
7 0.75691 K-RRRT-RRRR
8 0.63134 T-SSSV-HHHH
Sequence ../Output/yy\Programs\YP_026226.4 alignment. Using default output format....
# ../Output/Split_Seq/YP_026226.4.fasta - gap penalty: 1 - normalized: False
# align_column_number score column
0 0.91889 MMMMMM
1 0.85379 RRRRRR
2 0.55095 -YTTTH
3 -1000.00000 -L---A
4 -1000.00000 -A---F
5 -1000.00000 AG---L
6 -1000.00000 IM---P
7 -1000.00000 -----A
From the second data column(i.e., score), for those value(s) which are more than 0.5, I want to extract the corresponding first column number (or range).
For the above Input, the output would be:
NP_416485.4: 1, 3-4, 7-9
YP_026226.4: 1-3
Here, "NP_416485.4" and "YP_026226.4" are from header descriptor (after \Programs). (Note that, the actual value for "NP_416485.4" for example, should be, "NP_416485.4: 0, 2-3, 6-8", but I increases all of them with +1 as I dont want to start with 0).
Please help me. How can I generate the desired output? Thanks.
Thanks khericlim, to start with, I have used python csv module as:
but it gives:
Please help. Thanks.
xrange
, likerange
, takes integers, but you're giving floats. See here.