Question

wrong bam file format

0

Entering edit mode

8.1 years ago

alirezamomeni707 • 0

I am trying to align my .fastq file using tophat on transcriptom. the resulting .bam file look like this:

@SQ SN:89720    LN:1312
@SQ SN:89721    LN:735
@SQ SN:89722    LN:5191
@SQ SN:89723    LN:361
@SQ SN:89724    LN:9056
@SQ SN:89725    LN:83
@SQ SN:89726    LN:2603
@SQ SN:89727    LN:954
@SQ SN:89728    LN:468

but when I try to make .bg or .wig file for the UCSC, I need to have some thing like this:

@SQ SN:chr1 LN:249250621
@SQ SN:chr10    LN:135534747
@SQ SN:chr11    LN:135006516
@SQ SN:chr12    LN:133851895
@SQ SN:chr13    LN:115169878
@SQ SN:chr14    LN:107349540
@SQ SN:chr15    LN:102531392
@SQ SN:chr16    LN:90354753
@SQ SN:chr17    LN:81195210

otherwise UCSC gives error (because the chromosomes are not clear in the first one). do you know what the problem is?

RNA-Seq • 2.3k views

ADD COMMENT • link updated 8.1 years ago by Devon Ryan 105k • written 8.1 years ago by alirezamomeni707 • 0

score 1 · Answer 1 · 2017-07-28

1

Entering edit mode

8.1 years ago

Devon Ryan 105k

Delete the BAM file, do not use tophat, and realign the data. Common aligners for RNAseq data are STAR, hisat2, and BBMap.

ADD COMMENT • link 8.1 years ago by Devon Ryan 105k

score 0 · Answer 2 · 2017-07-28

0

Entering edit mode

8.1 years ago

Ido Tamir 5.2k

Because you aligned to the transcriptome. With tophat you align to the genome + a transcriptome annotation in GTF format if you have one at hand. And check the names of the chromosomes. ENSEMBL does not do the chr prefix. You might have to change it in the files.

ADD COMMENT • link 8.1 years ago by Ido Tamir 5.2k

0

Entering edit mode

no because I had a look at the reference file I used and chr numbers are there.

ADD REPLY • link 8.1 years ago by alirezamomeni707 • 0

0

Entering edit mode

It is not just numbers. 17 is not equivalent to chr17. In your case the chromosome names seem to be some odd numbers (e.g. 89720). Are these coming from contig headers/fasta files?

ADD REPLY • link 8.1 years ago by GenoMax 153k

0

Entering edit mode

do grep '>' genome.fa Also: the headers you have now look like from transcripts. The second column is the length of the contig. If you have additional chromosomes, UCSC does not care. It only cares if it cant find the one it wants to display in your bam file. Please write the UCSC error into the question. Maybe you even have a different problem.

ADD REPLY • link 8.1 years ago by Ido Tamir 5.2k