Hey everyone,
I want to add a customized sequence to the fasta
file of my reference genome. So, I concatenated both files:
cat Homo_sapiens.GRCh38.dna.primary_assembly.fa Gene_mod.fa > HSapiens_Ensembl111mod.fa
In the Gene_mod.fa
, the header of the sequence is similar to the ones found in the fasta:
>AddedSeq dna:scaffold scaffold:GRCh38:AddedSeq:1:2913:1 REF
Afterwards, to subset the file using samtools faidx
for chromosome 3 and the AddedSeq, using the command:
samtools faidx HSapiens_Ensembl111mod.fa 3 AddedSeq >HSapiens.GRCh38_Chr3_AddedSeq.fa
it says it failed to retrieve AddedSeq
.
Is there any problem with my code? I have used this fasta file for other purposes (like STAR) and it gave me no issues or problems.
Cheers
I have removed all of that text and it solved the problem, thanks!
The structure of the header was similar when compared with the
dna.primaryassembly.fa
I got from Ensembl. In addition, when using this structure and a.gtf
file to match, it seemed to worked fine...