Question

fastq tagalign file format from ncbi geo

0

Entering edit mode

8.2 years ago

Sudhir Jadhao ▴ 70

Hello everyone,

I have file from ncbi geo of chipseq data with extenseion of *_R1.fastq.tagAlign.gz and *_R2.fastq.tagAlign.gz.

Not able to understand which format is it. and how to convert it into bedgraph file.

Format details : *_R1.fastq.tagAlign

First line from file :

HWI-ST1052:49:C0ARLACXX:1:1101:1238:1955 1:N:0: + chr17 43607603 TGCACCACTGCATCTGGCCACAAACATTTTGTTTTTTTACTGTTCATTTT CCCFFFFFHHHHHJJJJJJJJJJJJJJJJJJGIJJJJJIJJJHIJIJJJJ 1

*_R2.fastq.tagAlign

First line from file :

HWI-ST1052:49:C0ARLACXX:1:1101:1238:1955 2:N:0: - chr17 43607776 AGCTGGGATTACAGGTGCCTGCCACCACCCCCTGCTAATTTTTGTACTNT FFHHHHHHJJJJIJJJIJJJJJIHJHFJJJJJJJJJJHHHHHFFDBA1#C 0 1:T>N

ChIP-Seq • 2.3k views

ADD COMMENT • link 8.2 years ago by Sudhir Jadhao ▴ 70

0

Entering edit mode

This looks more like a hybrid of a fastq and sam and tagalign. It looks like you will need to do some of your own processing to get this into a standard format.

Off the top of my head I would pull columns 3, 4, (length of col5+/-col4) , 1, 2 in this order, to make a bed file. then bed -> bg is straight forward.

For minus strand reads it should be col4 - length(col5) and plus strand it should be col4+length(col5); please double check that I tend to switch strand operations on occasion.

Good luck!

ADD REPLY • link 8.2 years ago by ejm32 ▴ 450