Hi everyone
I am trying to analyse some old data on STAT3 binding locations in macrophages upon IL-10 treatment. I found a dataset which perfectly matches what I want, however it was only available in bowtie format. My initial aim is to view it in the UCSC genome browser and quickly check for peaks at my genes of interest then move on to a more detailed analysis.
I managed (with some difficulty, I'm very new to all this) to convert it into a SAM file, however when I try to upload it to UCSC I get an error. After spending a while trying to figure out what was up, I discovered that it's missing the @SEQ header.
Now, I know that it has been mapped to mm9 as the reference genome, so I was hoping someone could help me generate a basic header.
I have read around and attempted converting it to a bam, using a fasta file of mm9 chr1, however I'm kicked back given an error:
samtools view -bT Documents/chr2.fa Documents/ChIP/STAT3.txt > STAT3.bam
[samfaipath] build FASTA index...
[W::sam_parse1] urecognized reference name; treated as unmapped
[W::sam_read1] Parse error at line 1
[main_samview] truncated file.
I would appreciate any help people can provide, or alternate methods of generating the @SEQ header, and please explain in detail, I'm new to all this and it takes me a while to understand what exactly I have to do.
Thank you!
wha is the output of
A204RKABXX:3:26:4008:112894#TGACCAAT 0 chr15 14328859 25 47M * TACCTTGCTTTGGGGATTACAGTTAAGTGACTGAATGAACCTCAGGA GGGGGGGGGGGGGGGGGGGGGGGGGGGFGEGGGGGGGGGGGGGDGGD NM:i:0 X0:i:1 MD:Z:47 A204RKABXX:3:26:6009:112894#TGACCAAT 0 chr18 43952303 25 47M * CAGCCCAGTGTTCTTTATGTGGCGCCAAAATGCCCCTCCCCTTTAGT GFGGGGGGGGGGGGGGGGFGGGGGGGGGDFGGGGGGGGGGGGGGEGE NM:i:0 X0:i:1 MD:Z:47 A204RKABXX:3:26:5766:112899#TGACCAAT 0 chr2 17589600 25 47M * AGGAAGACACTGGACTTTTTATGGCTGGTACTAGGCATATCTCCCTG GGGFFGGGFGGGGGGGGGGGEGEGGGG#################### NM:i:1 X1:i:1 MD:Z:27G16C2
A204RKABXX:3:26:8370:112900#TGACCAAT 16 chr8 64320362 25 47M * TCGCCTATTTTGTTAGTTTGAAACAACTATGCAGCCCTGAATGACTT GFGGGGFFFGGGGGGGDGGGGGGGGGGGGGGGGGGGGGGGGFGGGGG NM:i:0 X0:i:1 MD:Z:47 A204RKABXX:3:26:8496:112900#TGACCAAT 0 chr2 132770552 25 47M * TGGTTTTCCCACATTCCTTTCCTATCTCTCTGCGCCTTCAGTTTGGC EEEEGGGGGGGGGGGGGGGGGGEGGFGGGGGGGEEGFGGGGFGFDEB NM:i:0 X0:i:1 MD:Z:47 A204RKABXX:3:26:8144:112893#TGACCAAT 0 chr3 27838344 25 47M * TTTGTTTGAAACAGTCTTCTGTAGCTCAGGCTGCACTCAAAGGCTAT GGGGGGGFGGGFGGGFGGDDDEFGGDFEFDEE?EEBEBDEEAFEEDA NM:i:0 X0:i:1 MD:Z:47 A204RKABXX:3:26:12358:112894#TGACCAAT 0 chr4 134222144 25 47M * TGTTAGCCTTGGTTTCTGTTCCCGGCCATTCACACACAGCCCACCTC GGGFGGGGGGDFFGFGGFGFGDGGGFGGGEGEG:CCCCEEGG?EE?E NM:i:0 X0:i:1 MD:Z:47 A204RKABXX:3:26:16734:112899#TGACCAAT 16 chr2 29721733 25 47M * GGTCGAGATCCAGAAGATCTGCTGTCTGGTGAGGACCTGTTCCTCAC ECBCGDGEGDFCFFCEBAEAGGGGEEGGGGFGGGEEGGGGEFGGGGG NM:i:0 X0:i:1 MD:Z:47 A204RKABXX:3:26:16019:112892#TGACCAAT 16 chr16 85400695 25 47M * AGCAAACACCAGGAAAATAGCAGGATACTGTTGCTAAGGAAATGGGA GDFGFFFBFFFGFGGFGAGEGGGGFFFFEEGGGGEGGFGGEGFGGGG NM:i:0 X0:i:1 MD:Z:47 A204RKABXX:3:26:14189:112898#TGACCAAT 0 chr9 54525873 25 47M * CATTGGTATTTTGACTGCATGTCTGTCTGTGTTAGATCCCCTGGAAC EBFFFEFDEDEEAEDEE?DDCDCFFFBFEEAEECEAEEFFEEEEFEE NM:i:0 X0:i:1 MD:Z:47