Question

454 Sequences: How To Get A Bam + Coverage ?

3

Entering edit mode

14.4 years ago

Pierre Lindenbaum 164k

Hi all,

I've been given a set of 454 sequences/results and I'm very new with this kind of data.

1.XXX.454Reads.fna (i guess these are the fasta sequences for the reads... )
1.XX.454Reads.qual (... and the qualities....)
a tgz file containing some binary *.clc files (?)

454AllStructVars.txt.gz 454HCStructVars.txt.gz 454AllDiffs.txt.gz 454HCDiffs.txt.gz : it should be the allele calling. I guess those files were generated by the 'Genome Sequencer FLX System' isn't it ? If not, what is that tool ? I understand the *Diffs files however, the content of *StructVars.txt is not clear to me, for example, how should I interpret the following output:

>chr19    1988212    <--        ?    ?    ?        2    100.00    -    Point
Reads with Difference:
chr19                   1988172+ TTGTATTTTTGGTAGAGGCGGGATTTCATCATGTTGGCCAGACCTCGAGTGATC--CACCTGCCT-TGGCCTCCCAAAGT 1988248
                                                                 *
GKF3EFN01B3QKI              237-                                         GACCTCG--TGATCTGC-CC-GCCTCTG-CCTCCCAAAGT 203
GKF3EFN01CM6BB              183+                                         GACCTCG--TGATCTGC-CC-GCCTCTG-CCTCCCAAAGT 217
                                                                 *
Other Reads:

does that mean that only the tail of two reads was mapped on the reference (=deletion) ? what is the <kbd>'*'</kbd> under the reference ?

is there a way to transform those data to SAM/BAM ?
how can i get the coverage of the genome with those data ?

Many thanks,

Pierre

format bam coverage • 3.5k views

ADD COMMENT • link updated 13.9 years ago by Gmoney ▴ 220 • written 14.4 years ago by Pierre Lindenbaum 164k

0

Entering edit mode

Please update the question title to be a question...

ADD REPLY • link 13.9 years ago by Egon Willighagen 5.4k

Ram · Answer 1 · 2010-08-04

5

Entering edit mode

14.4 years ago

Casbon ★ 3.3k

StructVars.txt: see manual here p 125 (btw you should use the -fd flag for full descriptions).

Transform to SAM/BAM see my question and answer

Or use sff2fastq then bwa sw (although bwas alignments are not as good for homopolymers)

Coverage: 454AllContigs.fna will give you quick and dirty coverage by looking at the headers, otherwise you are probably best converting to sam and using something else.

ADD COMMENT • link updated 5.3 years ago by Ram 44k • written 14.4 years ago by Casbon ★ 3.3k

0

Entering edit mode

The 454AlignmentInfo.tsv file contains per-base coverage stats. If you don't have it just add the -info flag to the runMapping command.

ADD REPLY • link 13.9 years ago by lexnederbragt ★ 1.3k

0

Entering edit mode

The only trouble I've found with 454AlignmentInfo is that is leaves read direction off. It can be useful for certain applications to know if there is a read direction bias.

ADD REPLY • link 12.9 years ago by Gmoney ▴ 220

0

Entering edit mode

Only trouble I've noticed with that particular file is it has no read orientation information. This may or may not matter, but for certain applications (exon capture) read bias can be useful to note. Else, its great.

ADD REPLY • link 12.9 years ago by Gmoney ▴ 220

score 2 · Answer 2 · 2010-08-04

2

Entering edit mode

14.4 years ago

Istvan Albert 102k

The *.clc extension may indicate that some of your files may have been created by CLC Genomic Workbench.

ADD COMMENT • link 14.4 years ago by Istvan Albert 102k

score 0 · Answer 3 · 2012-01-31

0

Entering edit mode

12.9 years ago

Gmoney ▴ 220

The newest version of gsMapper supposedly supports SAM/BAM output. I haven't upgraded yet, but the manual on their site has a screen shot of it.

ADD COMMENT • link 12.9 years ago by Gmoney ▴ 220