Sam file Header problem
0
0
Entering edit mode
6 months ago

Hello Everyone. I am working with the sra data for whole exome sequence analysis. I am facing a problem regarding the sam file that I created after alignment. I am adding all the steps.

fastq-dump --split-files SRR1178899.sra

fastqc *.fq

bwa mem -t 12 -Y -L 0 -M -R "@RG\tID:sample\tSM:sample\tPL:Illumina" /mnt/nas/reference_genome/BWA/mammals/hg38/genome.fa R1_step1.fq R2_step1.fq > aligned_reads.sam

after this, when I check,

samtools quickcheck aligned_reads.sam aligned_reads.sam was not identified as sequence data.

samtools view -H aligned_reads.sam [main_samview] fail to read the header from "aligned_reads.sam".

less aligned_reads.sam

@SQ     SN:chrUn_KI270363v1     LN:1803
@SQ     SN:chrUn_KI270364v1     LN:2855
@SQ     SN:chrUn_KI270362v1     LN:3530
@SQ     SN:chrUn_KI270366v1     LN:8320
@SQ     SN:chrUn_KI270378v1     LN:1048
@SQ     SN:chrUn_KI270379v1     LN:1045
@SQ     SN:chrUn_KI270389v1     LN:1298
@SQ     SN:chrUn_KI270390v1     LN:2387
@SQ     SN:chrUn_KI270387v1     LN:1537
@SQ     SN:chrUn_KI270395v1     LN:1143
@SQ     SN:chrUn_KI270396v1     LN:1880
@SQ     SN:chrUn_KI270388v1     LN:1216
@SQ     SN:chrUn_KI270394v1     LN:970
@SQ     SN:chrUn_KI270386v1     LN:1788
@SQ     SN:chrUn_KI270391v1     LN:1484
@SQ     SN:chrUn_KI270383v1     LN:1750
@SQ     SN:chrUn_KI270393v1     LN:1308
@SQ     SN:chrUn_KI270384v1     LN:1658
@SQ     SN:chrUn_KI270392v1     LN:971
@SQ     SN:chrUn_KI270381v1     LN:1930
@SQ     SN:chrUn_KI270385v1     LN:990
@SQ     SN:chrUn_KI270382v1     LN:4215
@SQ     SN:chrUn_KI270376v1     LN:1136
@SQ     SN:chrUn_KI270374v1     LN:2656
@SQ     SN:chrUn_KI270372v1     LN:1650
@SQ     SN:chrUn_KI270373v1     LN:1451
@SQ     SN:chrUn_KI270375v1     LN:2378
@SQ     SN:chrUn_KI270371v1     LN:2805
@SQ     SN:chrUn_KI270448v1     LN:7992
@SQ     SN:chrUn_KI270521v1     LN:7642
@SQ     SN:chrUn_GL000195v1     LN:182896
@SQ     SN:chrUn_GL000219v1     LN:179198
@SQ     SN:chrUn_GL000220v1     LN:161802
@SQ     SN:chrUn_GL000224v1     LN:179693
@SQ     SN:chrUn_KI270741v1     LN:157432
@SQ     SN:chrUn_GL000226v1     LN:15008
@SQ     SN:chrUn_GL000213v1     LN:164239
@SQ     SN:chrUn_KI270743v1     LN:210658
@SQ     SN:chrUn_KI270744v1     LN:168472
@SQ     SN:chrUn_KI270745v1     LN:41891
@SQ     SN:chrUn_KI270746v1     LN:66486
@SQ     SN:chrUn_KI270747v1     LN:198735
@SQ     SN:chrUn_KI270748v1     LN:93321
@SQ     SN:chrUn_KI270749v1     LN:158759
@SQ     SN:chrUn_KI270750v1     LN:148850
@SQ     SN:chrUn_KI270751v1     LN:150742
@SQ     SN:chrUn_KI270752v1     LN:27745
@SQ     SN:chrUn_KI270753v1     LN:62944
@SQ     SN:chrUn_KI270754v1     LN:40191
@SQ     SN:chrUn_KI270755v1     LN:36723
@SQ     SN:chrUn_KI270756v1     LN:79590
@SQ     SN:chrUn_KI270757v1     LN:71251
@SQ     SN:chrUn_GL000214v1     LN:137718
@SQ     SN:chrUn_KI270742v1     LN:186739
@SQ     SN:chrUn_GL000216v2     LN:176608
@SQ     SN:chrUn_GL000218v1     LN:161147
@SQ     SN:chrEBV       LN:171823
@PG     ID:bwa  PN:bwa  VN:0.7.17-r1188 CL:bwa mem -t 12 -M /mnt/nas/reference_genome/BWA/mammals/hg38/genome.fa R1_step1.fq R2_step1.fq
SRR1178899.1    77      *       0       0       *       *       0       0       TNGTTCCAGCGACAGCCCATCCTATAGCACTCTCCAGGAGAGAAATCCAGCACACAAAAAAGATTCTACATCTATTAAGTAAGTGAGGTCTGAGTTGGAT    =#14:ADD=D@FFGCGHIIAHHGGGDC@DHC?::BGFHG;?(?FH>4.8))7=8(5-=?AB#######################################    AS:i:0  XS:i:0
SRR1178899.1    141     *       0       0       *       *       0       0       ACATATATTGGAAACTACAACACTATGGGGAAGAGAACCAATTCAGAACTCAATAACTTAATAGAAGGAGAAGCTTTTTGATGTACTATATTTCTCTCCA    ####################################################################################################    AS:i:0  XS:i:0
SRR1178899.2    77      *       0       0       *       *       0       0       TNTTTCCAGCGACAGCCCATCCTATAGCACTCTCCAGGAGAGAAATTTAGTACACAATAAGGAGACCCCTCGTCTTAAGTGCGGTCGGTAAGAGTCGGAT    <#144ADDDDDDDIIIIIIIIIIIIIIIIIIIIIIIIIDIDIDIIIICIICEEICC############################################    AS:i:0  XS:i:0
SRR1178899.2    141     *       0       0       *       *       0       0       AGGATTTAAATAGGCGCTCGGGGTCTGCAATAGCCCCCAGCTGCGTGGTAAATCTGCCTCACGGAGTGTCCTGTAGGATTGGCTACTACGGGGAACGCAG    ####################################################################################################    AS:i:0  XS:i:0
SRR1178899.3    83      chr12   94582927        60      45M55S  =       94582829        -143    ATCCAACTCAGAACTCACTCACTTAATAGAAGGAGAATCTTTTTTGTGTACTAAATTTCTCTCCTGGAGAGTGCTATAGGATGGGCTGTCGCTGGATGNA    ###DDDCCA@:;5@A:ACCC;:B>-C>33CCECC>C@EDFGIGGGIHHEDJIIGHJHGGD=FEGFGIFIIJHEDCGFJJIHGHFE@GHFCFAFFDDA1#@    NM:i:0  MD:Z:45 MC:Z:100M       AS:i:45 XS:i:0
SRR1178899.3    163     chr12   94582829        60      100M    =       94582927        143     ATGAGGTCACCAGTCAGTCCCGGTCTCCCAAAGTGCCCAGGTAACTGGAATGCCTGCCATGCCACATTCACTGGGAACTTCACCACTATGGGGAACGCAT    @@BFFFFFFHHBCAFE@EICFFG@GFCHGHGID?BFGGIJIFDFGEHEFFGHEHIJJJIJJIIEHHHHHHHF>B?BEDDECECDDBCDDD>ABDBDBB@B    NM:i:1  MD:Z:60A39      MC:Z:45M55S     AS:i:95 XS:i:20
SRR1178899.4    83      chr1    121509428       60      46M54S  =       121509376       -98     ATGCCCAACAATGACAGACTGAATAAAGAAATTGTGCTACATATATGTGTACTAAGTTTCTCTCCTGGAGAGTGCTATAGGAGGGCTGTCGCTGGAGCNC    9528?<CA:(55(A@;(>>;@@A;.;7););6..7A?==HDC;@=)..///8.=B8>EGEHHBFD?*FGFB9B<F>HAHGHFE;BG@AFFAA:DBA41#?    NM:i:0  MD:Z:46 MC:Z:98M2S      AS:i:46 XS:i:30
SRR1178899.4    163     chr1    121509376       60      98M2S   =       121509428       98      AAATGTTTACTGCAACATTATTCATGATAGCAAAGATATGAAATCAACCTAAATGCCCAACAATGACAGACTGAATAAAGAAATTGTGCTACATATATGT    @@@DFDDFFFHHHC@<:CC?IHCBHHIICH???DFCGGIIIGIIDAC??D:BAFH@?F?DDBEHGGF

So, I create a header.txt file for hg38 genome which is my reference genome.

@HD VN:1.6  SO:coordinate
@SQ SN:chr1 LN:248956422
@SQ SN:chr2 LN:242193529
@SQ SN:chr3 LN:198295559
@SQ SN:chr4 LN:190214555
@SQ SN:chr5 LN:181538259
@SQ SN:chr6 LN:170805979
@SQ SN:chr7 LN:159345973
@SQ SN:chr8 LN:145138636
@SQ SN:chr9 LN:138394717
@SQ SN:chr10    LN:133797422
@SQ SN:chr11    LN:135086622
@SQ SN:chr12    LN:133275309
@SQ SN:chr13    LN:114364328
@SQ SN:chr14    LN:107043718
@SQ SN:chr15    LN:101991189
@SQ SN:chr16    LN:90338345
@SQ SN:chr17    LN:83257441
@SQ SN:chr18    LN:80373285
@SQ SN:chr19    LN:58617616
@SQ SN:chr20    LN:64444167
@SQ SN:chr21    LN:46709983
@SQ SN:chr22    LN:50818468
@SQ SN:chrX LN:156040895
@SQ SN:chrY LN:57227415
@SQ SN:chrM LN:16569

cat header.txt aligned_reads.sam > aligned_header.sam samtools quickcheck aligned_header.sam align_header.sam was not identified as sequence data.

Can you please help in this case? It is very urgent. Thank you in advance.

I can provide more infromation if you need.

Sam Header problem file • 515 views
ADD COMMENT
0
Entering edit mode
 -R "@RG\tID:sample\tSM:sample\tPL:Illumina" 

where is this '@RG' in the header anyway ? are you sure you're handling the correct file ?

ADD REPLY
0
Entering edit mode

Thank you for your reply.

I am not very much sure about this RG. Can you explain me a bit?

ADD REPLY

Login before adding your answer.

Traffic: 2152 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6