Entering edit mode
7.7 years ago
kristincilla_16
▴
20
Hi,
I am new in next generation sequencig. I want to analyze my sequences using GATK. But it gives me an error at the first step because my bam files doesn't have the read group. I have read about it in others posts, but I have some specific questions.
I have differents samples from differents platfforms. Also I have in the same fastq of one sample reads from differents lines. So, I have to create the read group manually to each different sample that I have? Is any different automatic way to do this? Is it necessary to have sam files from sequencing better than fastq?
Thanks,
I assume reads from different lanes?
I don't understand what you mean by this. A sam/bam file is required for most steps in ngs analysis.
Yes of course, but in GATK best practice suggest to get unmaped sam files from Illumina sequencing instead of fastq because sam file can store metadata useful for read gruop
Please use
ADD COMMENT/ADD REPLY
when responding to existing posts to keep threads logically organized.Hmmm.. What are you trying to do? Did you already map your fastq to produce a bam file? You can add RG tag at alignement/mapping step using the aligner options. Or if you start with a bam file, then picard tool can do that https://broadinstitute.github.io/picard/command-line-overview.html#AddOrReplaceReadGroups
Yes? Can I add read group from fastq file? I had fastq files but I don't know how to add read group head when I make the alignement/mapping.
Mi fastq files looks like this:
@NB501082:6:H5KG3AFXX:1:11101:6550:1056 1:N:0:ATAGCGAC
Thanks
Most probably, you are using BWA for mapping. You can add read group by giving this option to BWA
Here SM = my-sample_id.. I've put NULL in the field I dont care about or dont have knowledge about. You can see the explanation of all fields here http://gatkforums.broadinstitute.org/gatk/discussion/6472/read-groups
Perfect, I understand the read group now. So I only going to do the alignment again adding -R @RG\tID:NULL\tSM:1305H\tLB:NULL\tPL:Illumina with the sample name. I hope that works!
You do not need to. That is the purpose of using Picard to add this information as linked by @Santosh above: C: Problems with read group in GATK
probably you might also need to double quote the whole string, like
-R "@RG\tID:NULL\tSM:1305H\tLB:NULL\tPL:Illumina"
Fastq headers do not have any useful information that you could use in read groups. See GATK, SAM file doesn't have any read groups defined in the header for some ideas if you don't have real information available.
ok, thanks for the link! I am going to try some indications.