Finding and changing a given position in hg19 ref fasta file
2
0
Entering edit mode
5.9 years ago
cetin.m ▴ 50

I want to go to the hg19 reference fasta file and change say, position 727004 in chromosome 1 for a project.

The file contains N's in the beginning. Where do I start counting to find position 727004?

I am thinking of writing a python script to find the given positions and change them.

Is there a simpler way to do this?

hg19 fasta reference • 2.1k views
ADD COMMENT
0
Entering edit mode

You have to start from the beginning of your sequence, so chr1 position 1

Python will works well, create a dictionnary of positions you want to change and iterate over your hg19 reference

change specific bases in a fasta file

http://seqanswers.com/forums/showthread.php?t=76589

ADD REPLY
2
Entering edit mode
5.9 years ago

(updated) Haha, yesterday I just wrote a new seqkit command mutate (v0.10.0-dev2 or later versions, usage) to do this:

# change base at 727004 of sequence with ID of "1".
seqkit mutate -p 727004:C -s 1

Example:

$ cat hsa.fa
>chr1 1th seq
ACTGNactgn
>chr2 2nd seq
actgnACTGN
>chr11 11th seq
ACTGNACTGN
>MT mitochondrial seq
actgnactgn

# position "-1" refers to the last base.
$ cat hsa.fa | seqkit mutate -p -1:X -s chr1
[INFO] edit seq: chr1 1th seq
>chr1 1th seq
ACTGNactgX
>chr2 2nd seq
actgnACTGN
>chr11 11th seq
ACTGNACTGN
>MT mitochondrial seq
actgnactgn

Before this, you may need to retrieve the chr1 using seqkit head -n 1 or seqkit grep -r -p chr1 or seqkit faidx g.fa chr1.

Then cat back to file without chr1 (seqkit grep -r -p chr1 -v ) after editting.

I'll add a new option to only edit chosen sequence soon, so you can do this in one command.

ADD COMMENT
0
Entering edit mode

I've added options similar in seqkit grep to choose seqs to edit, and updated the answer.

ADD REPLY
1
Entering edit mode
5.9 years ago
venu 7.1k

before you start writing all that code, check out maskfasta from bedtools. Isn't this what you want?

ADD COMMENT
0
Entering edit mode

Seems like exactly what I want!

ADD REPLY

Login before adding your answer.

Traffic: 2168 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6