samtools faidx not finding my modified sequence
1
0
Entering edit mode
10 months ago

Hey everyone,

I want to add a customized sequence to the fasta file of my reference genome. So, I concatenated both files:

cat Homo_sapiens.GRCh38.dna.primary_assembly.fa Gene_mod.fa > HSapiens_Ensembl111mod.fa

In the Gene_mod.fa, the header of the sequence is similar to the ones found in the fasta:

>AddedSeq dna:scaffold scaffold:GRCh38:AddedSeq:1:2913:1 REF

Afterwards, to subset the file using samtools faidx for chromosome 3 and the AddedSeq, using the command:

samtools faidx HSapiens_Ensembl111mod.fa 3 AddedSeq >HSapiens.GRCh38_Chr3_AddedSeq.fa

it says it failed to retrieve AddedSeq.

Is there any problem with my code? I have used this fasta file for other purposes (like STAR) and it gave me no issues or problems.

Cheers

genome samtools • 491 views
ADD COMMENT
4
Entering edit mode
10 months ago
ATpoint 86k

It is almost certainly all these whitespaces that mess things up. Replace them by a proper delimiter such as underscore in your fasta before concatenating the AddedSeq to the genome.

ADD COMMENT
0
Entering edit mode

I have removed all of that text and it solved the problem, thanks!

The structure of the header was similar when compared with the dna.primaryassembly.fa I got from Ensembl. In addition, when using this structure and a .gtf file to match, it seemed to worked fine...

ADD REPLY

Login before adding your answer.

Traffic: 1145 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6