samtools view not removing unwanted headers
0
0
Entering edit mode
3.1 years ago
bpf24 • 0

Hi,

I assume I am doing something really thick but cannot for the life of me restrict my bam files to chromosome only fields and remove scaffolds. I have run samtools as suggested in other posts but every time I check the file, all the headers are still present with their reads. Below are my commands leading up to it from bowtie2 alignment

samtools view -b in.sam > out.bam

samtools sort in.bam -o out.bam

samtools index in.bam

samtools view -b in.bam 1 > out.bam

or

samtools view -b -o out.bam in.bam `seq 1 22`

or even creation of index.bed file from fa

awk '/^[0-9]*\t/ {prin^C("%s\t0\t%s\n",$1,$2);}' genome.fa.fai  > out.bed

samtools view -L out.bed -o out.bam in.bam

Every time I view any of these files (samtools view -H out.bam) all the headers and reads are outputted.

Any help would be very appreciated!

Cheers,

Ben

samtools bam • 3.0k views
ADD COMMENT
0
Entering edit mode
samtools view -b in.sam > out.bam

samtools sort in.bam -o out.bam

samtools index in.bam

what are you doing here ? whay are your producing out.bam twice and indexing in.bam at the end ?

ADD REPLY
0
Entering edit mode

Just haven't piped one into another; first is to convert sam to bam, second is to sort by co-ordinate as well as create an index. Without these steps, samtools view outputs an error when I try to grab chromosome related entries only

ADD REPLY
0
Entering edit mode

You don't need to do this. Firstly, it's pointless producing a compressed BAM when you're instantly decoding it again and outputting a new BAM. But more importantly, it's totally redundant anyway as sort can read in SAM directly.

The old idiom of SAM -> BAM and then BAM -> sorted BAM is something that should have died out many years ago, but is sadly oft repeated in bioinformatics memes.

ADD REPLY
0
Entering edit mode

all the headers are outputted.

the dictionary in the BAM/SAM will not be removed; All lines starting with @SQ will be kept

all the reads are outputted.

show us an example of read

ADD REPLY
0
Entering edit mode

Below is the output. I haven't included further lines of scaffold entries but you get the idea. So am i getting confused with samtools view -H, and the reads have only been preserved for chromosome 1-22 and not scaffold etc.? How do I check this?

[W::bam_hdr_read] EOF marker is absent. The input is probably truncated

@HD VN:1.0  SO:coordinate
@SQ SN:1    LN:248956422
@SQ SN:10   LN:133797422
@SQ SN:11   LN:135086622
@SQ SN:12   LN:133275309
@SQ SN:13   LN:114364328
@SQ SN:14   LN:107043718
@SQ SN:15   LN:101991189
@SQ SN:16   LN:90338345
@SQ SN:17   LN:83257441
@SQ SN:18   LN:80373285
@SQ SN:19   LN:58617616
@SQ SN:2    LN:242193529
@SQ SN:20   LN:64444167
@SQ SN:21   LN:46709983
@SQ SN:22   LN:50818468
@SQ SN:3    LN:198295559
@SQ SN:4    LN:190214555
@SQ SN:5    LN:181538259
@SQ SN:6    LN:170805979
@SQ SN:7    LN:159345973
@SQ SN:8    LN:145138636
@SQ SN:9    LN:138394717
@SQ SN:MT   LN:16569
@SQ SN:X    LN:156040895
@SQ SN:Y    LN:57227415
@SQ SN:KI270728.1   LN:1872759
@SQ SN:KI270727.1   LN:448248
@SQ SN:KI270442.1   LN:392061
@SQ SN:KI270729.1   LN:280839
@SQ SN:GL000225.1   LN:211173
ADD REPLY
0
Entering edit mode

your first problem is : your BAM is corrupted:

W::bam_hdr_read] EOF marker is absent. The input is probably truncated
`
ADD REPLY
0
Entering edit mode

as I said above, those lines will never be removed even if you select a region. Those @SQ lines are here to help tools to check that you're working with compatible references.

I said

show us an example of read

I don't want to see the header. I want to see a read validating what you said:

all the headers and reads are outputted.

ADD REPLY
0
Entering edit mode

aaaahhhh apologies, I was mistaken with what I was looking at. Once I printed the reads using the command below I can see it has removed them. My mistake and thanks for your help!!!

samtools idxstats out.bam | awk '{print $1" "$3}'

[E::idx_find_and_load] Could not retrieve index file for 'siARID1A_1_trim_sort_chr.bam'
samtools idxstats: fail to load index for "siARID1A_1_trim_sort_chr.bam", reverting to slow method
1 3173524
10 1136433
11 1678043
12 2292042
13 633397
14 1191474
15 908910
16 1058431
17 1728688
18 547287
19 1300401
2 2509617
20 937328
21 366041
22 541106
3 1978363
4 1211085
5 1639953
6 1882985
7 1999583
8 1530416
9 1344472
MT 0
X 1427920
Y 47756
KI270728.1 0
KI270727.1 0
KI270442.1 0
KI270729.1 0
GL000225.1 0
KI270743.1 0
GL000008.2 0
GL000009.2 0
KI270747.1 0
KI270722.1 0
ADD REPLY

Login before adding your answer.

Traffic: 2305 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6