Non-ATGC, small-case, 'N' characters in Fastq file
1
0
Entering edit mode
9.2 years ago

Hi All,

I am currently performing genome assembly. I have generated the consensus fastq file using the commands below. But the fastq file consists of lot non-ATGC characters (highlighted with bold). What are these characters and how to handle these?

Commands used to generate Fastq file:

bwa index ref.fa
bwa aln -t 9 ref.fa D2_R2.fastq -f D2_R2.sai && bwa aln -t 9 cocsa_ref.fa D2_R1.fastq -f D2_R1.sai
bwa sampe ref.fa D2_R1.sai D2_R2.sai D2_R1.fq D2_R2.fq > D2-aln-pe2.sam
samtools faidx ref.fa
samtools view -bt ref.fa.fai D2-aln-pe2.sam > D2-aln-pe2.bam
samtools sort D2-aln-pe2.bam D2-aln-pe2.bam.srt
samtools index D2-aln-pe2.bam.srt.bam
samtools mpileup -uf ref.fa D2-aln-pe2.bam.srt.bam | bcftools view -cg - | vcfutils.pl vcf2fq > CONSENSUS.fq

CONSENSUS.fq file looks like:

@scaffold_1
nnngtttggtggtagtattggtatttcaaacacgctaggtgtttgttggttttgagtagg
tgtagctggagtagactctatctccatttctctatcagtttgggcctctggccctaggct
ctcctgtctgttttcttgagtatttactacaatagtatcactgtctggcggcattttatt
actaagctcttttcttagtaagcaactagatggtctgtgtgtttttgttttcgtgagtga
gacgtgttcagattagctactttaccagcttctagctctatagcgcgtgggctgcacgag
ttggcactagttgtaatcgatttcttgggatggatttgtatataattcgctaaaattaca
cctattctgaaaaactcgnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn
nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn
nnnnnnnnTAATGTTACAAGTAAYAAGAAGGATYCTYTCCTTRACAAATRACGAGATGGC

P.S: Please also convey, how to handle the small-case characters and 'N's ? Should we mask/remove them to get a better set of scaffolds?

Thanks in advance

Genome-assembly small-case Non-ATGC Fastq • 3.8k views
ADD COMMENT
3
Entering edit mode
9.2 years ago

Lower case indicates masked sequences already (often due to low confidence); many tools will ignore them. I don't see any reason to remove them.

The non-ACGTN characters are IUPAC symbols typically indicating polymorphisms. I normally convert them to N before further processing.

ADD COMMENT
0
Entering edit mode

Thanks a lot Brian.

ADD REPLY

Login before adding your answer.

Traffic: 2562 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6