Can gffread extract specific region sequences from vcf or fasta file?
1
0
Entering edit mode
5.9 years ago
MatthewP ★ 1.4k

Hello, I have NGS sequencs result(vcf and fasta) file, I download reference sequence(GFF3 format) from NCBI, this link..
Here shows a few part of GFF3 file:

##species https://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?id=45949
KX254564.1      Genbank region  1       18117   .       +       .       ID=KX254564.1:1..18117;Dbxref=taxon:45949;Name=MT;gbkey=Src;genome=mitochondrion;mol_type=genomic DNA
KX254564.1      Genbank gene    1       1614    .       +       .       ID=gene-COX1;Name=COX1;gbkey=Gene;gene=COX1;gene_biotype=protein_coding
KX254564.1      Genbank CDS     1       1614    .       +       0       ID=cds-ANP25568.1;Parent=gene-COX1;Dbxref=NCBI_GP:ANP25568.1;Name=ANP25568.1;gbkey=CDS;gene=COX1;product=cytochrome coxidase subunit I;protein_id=ANP25568.1;transl_table=5
KX254564.1      Genbank gene    1673    2992    .       +       .       ID=gene-COX2;Name=COX2;gbkey=Gene;gene=COX2;gene_biotype=protein_coding
KX254564.1      Genbank CDS     1673    2992    .       +       0       ID=cds-ANP25569.1;Parent=gene-COX2;Dbxref=NCBI_GP:ANP25569.1;Name=ANP25569.1;gbkey=CDS;gene=COX2;product=cytochrome coxidase subunit II;protein_id=ANP25569.1;transl_table=5
KX254564.1      Genbank gene    2998    3060    .       +       .       ID=gene-trnG(TCC);Name=trnG(TCC);gbkey=Gene;gene=trnG(TCC);gene_biotype=tRNA
KX254564.1      Genbank tRNA    2998    3060    .       +       .       ID=rna-trnG(TCC);Parent=gene-trnG(TCC);gbkey=tRNA;gene=trnG(TCC);product=tRNA-Gly
KX254564.1      Genbank exon    2998    3060    .       +       .       ID=exon-trnG(TCC)-1;Parent=rna-trnG(TCC);gbkey=tRNA;gene=trnG(TCC);product=tRNA-Gly
KX254564.1      Genbank gene    3063    3124    .       +       .       ID=gene-trnR(TCG);Name=trnR(TCG);gbkey=Gene;gene=trnR(TCG);gene_biotype=tRNA
KX254564.1      Genbank tRNA    3063    3124    .       +       .       ID=rna-trnR(TCG);Parent=gene-trnR(TCG);gbkey=tRNA;gene=trnR(TCG);product=tRNA-Arg
KX254564.1      Genbank exon    3063    3124    .       +       .       ID=exon-trnR(TCG)-1;Parent=rna-trnR(TCG);gbkey=tRNA;gene=trnR(TCG);product=tRNA-Arg

What I want: extract specific parts sequence from vcf or fasta file.
example: all tRNA, all gene, specific gene.

I google and find gffread tool and see -h option, it seems this tool can't handle my job? I hope I can get some help here, thanks.

gffread vcf gff3 • 2.7k views
ADD COMMENT
0
Entering edit mode

I can write python script get those regions than get sequence from vcf file, but this needs much time, I hope there is tools already.

ADD REPLY
1
Entering edit mode

awk -F "\t" '{if($3=="tRNA")print}' file.gff > new_file.gff

ADD REPLY
0
Entering edit mode

Hi, if you need the entire information for each of those, you could easily grep them out.

ADD REPLY
1
Entering edit mode
5.9 years ago

I am not sure about gffread but extracting specific gene or all genes etc can be done using bedtools.

https://bedtools.readthedocs.io/en/latest/content/tools/getfasta.html

ADD COMMENT
0
Entering edit mode

It seems we can't get fasta file directly from NGS sequence VCF result? I actually get my fasta sequence from VCF by writting python script myself. Because we may get more than one variants at the same position in VCF.

ADD REPLY
1
Entering edit mode

We can get fasta from vcf file using vcftools consensus.

ADD REPLY

Login before adding your answer.

Traffic: 2530 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6