I'm trying to obtain some published chip-seq data from another lab that is stored in the SRA. I have downloaded and installed the SRA toolkit. I am having some problems obtaining a SAM file, that I can convert to BAM, and ultimately, BED. I was hoping Biostars could clarify some things, I found the SRA toolkit documentation to be difficult to understand.
First of all, do all SRA files contain information that can be ouput as a sam file using sam-dump
? Is there a standard set of data types that are included in an SRA file, or can it vary from dataset to dataset? If so, how does one know what type of data is included in a given SRA file?
I may well be misunderstanding many things, but I will include the steps I have attempted that ultimately result in a .bam file that cannot be converted to a .bed file. Any help in learning to access SRA files and transfer them to sam and then bam would be much appreciated.
Here is the first few lines of the output:
HWUSI-EAS1694_0008:5:1:1061:20521 4 * 0 0 * * 0 0 TGTGATCTGACCTTACCAATCTTTNCNNNNNNNNNN DFFFBFDDFFFFFFFFFFFB@=B@############
HWUSI-EAS1694_0008:5:1:1061:19640 4 * 0 0 * * 0 0 AATTCTACAACATCTCCAACAAATNTNNNNNNNNNN DDDFFFFFFFFFEFEFFFFFCDC#############
HWUSI-EAS1694_0008:5:1:1061:6084 4 * 0 0 * * 0 0 TATCCCAAGCTACTCCGGGCCTGCNTNNNNNNNNNN GGGGGEFFBFDGGFGFF?EECAAB############
Next I try to convert the .sam file to a .bam file using samtools, using the code:
$samtools view -S -b SRR490207.sam > SRR490207.bam
I get the following message:
[samopen] no @SQ lines in the header.
I do get a .bam file, which I try to sort using samtools, but again I get an error regarding the header:
samtools sort SRR490207.bam sortedSRR490207
[bam_header_read] EOF marker is absent. The input is probably truncated.
[bam_sort_core] truncated file. Continue anyway.
[bam_sort_core] merging from 3 files...
Finally, When I attempt to convert the .bam file to a bed file, using bamToBed
I get an empty .bed file.
Can anyone help me take an SRA file to a .bam file?
Thanks
Subsequent to my post I began to suspect something to the effect of what you described. fastq-dump does get result in a fastq file for another dataset from the same set of experiments, so I may just have to do the mapping myself as you suggest. Thanks.