Entering edit mode
6.1 years ago
rbronste
▴
420
I am trying to find a quick and easy way to parse an AME generated true positive sequences.tsv file to pull out just a 3 column BED, the format look as follows, any ideas would be awesome thanks!
motif_DB motif_ID seq_ID FASTA_score PWM_score class
Jaspar MA0004.1 chr5:144788829-144789179_shuf_2 2183 12.7135 fp
Jaspar MA0004.1 chr5:112339537-112339887_shuf_1 1713 12.7131 tp
Jaspar MA0004.1 chr16:94739915-94740265_shuf_1 1668 12.712 tp
Thanks very helpful! Is there additionally a way to include only the true-positive sequences (tp in final column) in the output bed?
That will be another grep or awk in the command :)
I think you can figure out how to do that?
Maybe a hint? :) Not as familiar with awk, though trying to learn.
I would add another grep to get lines with
tp
, prior tocut
.Ok figured it out seems to work like this for true positive intervals with specific motifs IDs:
Thanks for your help.
Thanks very helpful!
Used this to get the following:
However can't quite figure out how to select only specific motif_IDs in the .tsv file as well as only tp (true positive) values for those specific motif IDs.