I am trying to align my .fastq
file using tophat
on transcriptom. the resulting .bam
file look like this:
@SQ SN:89720 LN:1312
@SQ SN:89721 LN:735
@SQ SN:89722 LN:5191
@SQ SN:89723 LN:361
@SQ SN:89724 LN:9056
@SQ SN:89725 LN:83
@SQ SN:89726 LN:2603
@SQ SN:89727 LN:954
@SQ SN:89728 LN:468
but when I try to make .bg
or .wig
file for the UCSC
, I need to have some thing like this:
@SQ SN:chr1 LN:249250621
@SQ SN:chr10 LN:135534747
@SQ SN:chr11 LN:135006516
@SQ SN:chr12 LN:133851895
@SQ SN:chr13 LN:115169878
@SQ SN:chr14 LN:107349540
@SQ SN:chr15 LN:102531392
@SQ SN:chr16 LN:90354753
@SQ SN:chr17 LN:81195210
otherwise UCSC gives error (because the chromosomes are not clear in the first one). do you know what the problem is?
no because I had a look at the reference file I used and chr numbers are there.
It is not just numbers.
17
is not equivalent tochr17
. In your case the chromosome names seem to be some odd numbers (e.g.89720
). Are these coming from contig headers/fasta files?do
grep '>' genome.fa
Also: the headers you have now look like from transcripts. The second column is the length of the contig. If you have additional chromosomes, UCSC does not care. It only cares if it cant find the one it wants to display in your bam file. Please write the UCSC error into the question. Maybe you even have a different problem.