Hello everyone,
I have file from ncbi geo of chipseq data with extenseion of *_R1.fastq.tagAlign.gz and *_R2.fastq.tagAlign.gz.
Not able to understand which format is it. and how to convert it into bedgraph file.
Format details : *_R1.fastq.tagAlign
First line from file :
HWI-ST1052:49:C0ARLACXX:1:1101:1238:1955 1:N:0: + chr17 43607603 TGCACCACTGCATCTGGCCACAAACATTTTGTTTTTTTACTGTTCATTTT CCCFFFFFHHHHHJJJJJJJJJJJJJJJJJJGIJJJJJIJJJHIJIJJJJ 1
*_R2.fastq.tagAlign
First line from file :
HWI-ST1052:49:C0ARLACXX:1:1101:1238:1955 2:N:0: - chr17 43607776 AGCTGGGATTACAGGTGCCTGCCACCACCCCCTGCTAATTTTTGTACTNT FFHHHHHHJJJJIJJJIJJJJJIHJHFJJJJJJJJJJHHHHHFFDBA1#C 0 1:T>N
This looks more like a hybrid of a fastq and sam and tagalign. It looks like you will need to do some of your own processing to get this into a standard format.
Off the top of my head I would pull columns 3, 4, (length of col5+/-col4) , 1, 2 in this order, to make a bed file. then bed -> bg is straight forward.
For minus strand reads it should be col4 - length(col5) and plus strand it should be col4+length(col5); please double check that I tend to switch strand operations on occasion.
Good luck!