Entering edit mode
3.9 years ago
daewowo
▴
80
Error:
f = pysam.AlignmentFile("SRA_sorted.bam","rb")
File "pysam/libcalignmentfile.pyx", line 991, in pysam.libcalignmentfile.AlignmentFile._open
ValueError: file has no sequences defined (mode='rb') - is it SAM/BAM format? Consider opening with check_sq=False
Process I followed to get to the error:
I downloaded a SRA dataset from NCBI and used SRAtools sam-dump to convert the SRA into a sam file.
sam-dump --output-file SRA.sam SRA
I checked the file:
samtools quickcheck SRA.sam
>SRA.sam had no targets in header.
I then checked with picard:
java -jar gatk-package-4.1.9.0-local.jar ValidateSamFile I=SRA.sam MODE=SUMMARY
Error Type Count
ERROR:MISSING_READ_GROUP 1
ERROR:READ_GROUP_NOT_FOUND 23209332
WARNING:RECORD_MISSING_READ_GROUP 23209332
Looking at the sam file with head it looks OK
1 77 * 0 0 * * 0 0 TACAGAA...
I used the following to convert to bam file:
samtools sort SRA.sam -o SRA_sorted.bam
I confirmed that the file is binary format
I then used the .bam file in a third party program which uses pysam. The pysam command which threw the error:
f = pysam.AlignmentFile("SRA_sorted.bam","rb")
File "pysam/libcalignmentfile.pyx", line 991, in pysam.libcalignmentfile.AlignmentFile._open
ValueError: file has no sequences defined (mode='rb') - is it SAM/BAM format? Consider opening with check_sq=False
I ran picard on the bam file which gave same errors as sam file shown above.
How can I work out exactly what the error is with pysam opening the file and fix this?
it's not. The header is missing.
Are you sure this is a valid SAM file that you dumped? These seem to be no headers and the single line you posted as an example looks like an unaligned read. The read ID is also
1
which is odd. If the original data submitted was fastq you should align the reads yourself to get SAM/BAM files.Thanks I ran bwa to index to a reference genome and then aligned.
Now the .sam file has a header (and now I know it needs one) :-)
So things are working now?
Yes, thank you