fastq tagalign file format from ncbi geo
0
0
Entering edit mode
8.2 years ago

Hello everyone,

I have file from ncbi geo of chipseq data with extenseion of *_R1.fastq.tagAlign.gz and *_R2.fastq.tagAlign.gz.

Not able to understand which format is it. and how to convert it into bedgraph file.

Format details : *_R1.fastq.tagAlign

First line from file :

HWI-ST1052:49:C0ARLACXX:1:1101:1238:1955 1:N:0: + chr17 43607603 TGCACCACTGCATCTGGCCACAAACATTTTGTTTTTTTACTGTTCATTTT CCCFFFFFHHHHHJJJJJJJJJJJJJJJJJJGIJJJJJIJJJHIJIJJJJ 1

*_R2.fastq.tagAlign

First line from file :

HWI-ST1052:49:C0ARLACXX:1:1101:1238:1955 2:N:0: - chr17 43607776 AGCTGGGATTACAGGTGCCTGCCACCACCCCCTGCTAATTTTTGTACTNT FFHHHHHHJJJJIJJJIJJJJJIHJHFJJJJJJJJJJHHHHHFFDBA1#C 0 1:T>N

ChIP-Seq • 2.3k views
ADD COMMENT
0
Entering edit mode

This looks more like a hybrid of a fastq and sam and tagalign. It looks like you will need to do some of your own processing to get this into a standard format.

Off the top of my head I would pull columns 3, 4, (length of col5+/-col4) , 1, 2 in this order, to make a bed file. then bed -> bg is straight forward.

For minus strand reads it should be col4 - length(col5) and plus strand it should be col4+length(col5); please double check that I tend to switch strand operations on occasion.

Good luck!

ADD REPLY

Login before adding your answer.

Traffic: 2057 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6