Hi! I am working on a single cell ATAC sequencing project and am having an issue using samtools to split a bam file based on my wild-type and knock-out. The data came from 10X sequencing and they used the Cell Ranger pipeline for analysis. One analysis that the Cell Ranger conducted was t-sne which generates clusters based on similarity. Because different types of cells were used in the ATAC pipeline, similar cell types grouped together regardless of wild-type or knockout and a bam file was produced for each cluster. I would like to split these bam files to look at the variation within the cluster. I used samtools to convert the bam file to sam, and then split the knockout and wild-type files based on a tag line in the file. Now when I try to convert the split sam files back to bam, I keep getting this error.
samtools view -bS marrow_Cluster1_KO.sam > marrow_Cluster1_KO.bam
[W::sam_read1] Parse error at line 1
[main_samview] truncated file.
When I look through the entire marrow_Cluster1_KO.sam
file, it looks how it should. The head and tail or the file looks like this:
head -10 marrow_Cluster1_KO.sam
1112:@RG ID:A2.07,P2.24,A1.03,P1.03 SM:Barcode00086
1113:@RG ID:A2.08,P2.24,A1.10,P1.14 SM:Barcode00152
1114:@RG ID:A2.08,P2.16,A1.03,P1.08 SM:Barcode00191
1115:@RG ID:A2.08,P2.15,A1.09,P1.06 SM:Barcode00199
1116:@RG ID:A2.08,P2.09,A1.03,P1.24 SM:Barcode00248
tail -10 marrow_Cluster1_KO.sam
678439:NB551608:11:HVFMVBGX7:4:23502:4860:495 83 chr9 56881073 42 47M = 56881041 -79 AATCGCTTCCTTCGCGCTTCCGGGTTCCGCCTCGCTCAGAAACGGAC EEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEAAAAA MD:Z:47 XG:i:0 NM:i:0 XM:i:0 XN:i:0 XO:i:0 AS:i:0 YS:i:0 YT:Z:CP RG:Z:A2.07,P2.15,A1.10,P1.06 PG:Z:MarkDuplicates-6D71E14F
678440:NB551608:11:HVFMVBGX7:1:13105:7658:1885 99 chr9 56881081 42 47M = 56881248 214 CCTTCGCGCTTCCGGGTTCCGCCTCGCTCAGAAACGGACCGACAGAT
What can I do to fix this error?
Hi John,
Thank you so much for the reply! I am new to samtools so I am not sure what the numbers before the @RG are, but I tried what you had suggested and got this error:
[E::sam_parse1] missing SAM header [W::sam_read1] Parse error at line 191 [main_samview] truncated file.
What do you recommend I do to proceed?
Well, what does line 191 look like?
1300:@RG ID:A2.08,P2.14,A1.04,P1.05 SM:Barcode05838
4099:@PG ID:bowtie2-E6859AC-EAA2107 PN:bowtie2 VN:2.2.5 CL:"/mnt/users/sai/miniconda2/bin/bowtie2-align-s --wrapper basic-0 -X2000 -p 18 --rg-id kidney_marrow_KO_gata2B -x /mnt/users/sai/Script/genomes/bowtie2/GRCz10/GRCz10 -1 /mnt/AlignedData/181118-jeff-zebrafish//fastqs/kidney_marrow_KO_gata2B.Sub.0009.All.1.R1.trim.fastq -2 /mnt/AlignedData/181118-jeff-zebrafish//fastqs/kidney_marrow_KO_gata2B.Sub.0009.All.1.R2.trim.fastq"
4401:NB551608:11:HVFMVBGX7:1:23302:16011:1070 99 chr1 6671 42 47M = 6693 69 CATCAGAGTTTAGCGTTTGCCACCGACGCGAGGAGCGCTGACCTTCA AAAAAEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEE MD:Z:47 XG:i:0 NM:i:0 XM:i:0 XN:i:0 XO:i:0 AS:i:0 YS:i:0 YT:Z:CP RG:Z:A2.07,P2.22,A1.03,P1.20 PG:Z:MarkDuplicates-62387EC5
4402:NB551608:11:HVFMVBGX7:1:23302:16011:1070 147 chr1 6693 42 47M = 6671 -69 CCGACGCGAGGAGCGCTGACCTTCATGGGCTTGGCAATCTTCTGTTT EEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEAAAAA MD:Z:47 XG:i:0 NM:i:0 XM:i:0 XN:i:0 XO:i:0 AS:i:0 YS:i:0 YT:Z:CP RG:Z:A2.07,P2.22,A1.03,P1.20 PG:Z:MarkDuplicates-62387EC5
191 begins with 4401. I think one of my issues is also that I need a SAM @HD header which I do not have, but I also do not know how to generate because the files are not being mapped to anything to create a header. Do you know how to assign a header to a bam file?
Thanks, Meera