Entering edit mode
6.1 years ago
saamar.rajput
▴
80
I have 2 files, one fasta file and another gff file. In this way
head Fasta
>NC_002929.2 Bordetella pertussis Tohama I chromosome, complete genome
ATGGATTTTCCCCGCGAATTTGATGTGATCGTCGTTGGTGGCGGTCACGCCGGTACGGAGGCAGCCCTGGCTGCAGCCCG
CGCCGGCGCACAGACATTGCTGCTTACCCACAATATCGAGACCCTGGGCCAAATGTCCTGCAATCCCTCCATCGGGGGGA
TAGGCAAGGGTCATTTGGTCAAGGAAGTCGATGCGTTGGGCGGCGCGATGGCTATCGCCACCGACGAGGCAGGTATCCAA
TTCCGTATTCTCAACAGCTCCAAGGGGCCAGCGGTACGTGCCACGCGTGCCCAAGCCGACCGGGTGCTGTACCGAAACGC
CATACGTGCACAGCTCGAGAACCAGCCCAACCTCTGGCTGTTCCAGCAGGCGGTGGACGATCTGATGGTGCAGGGCGACC
AGGTGGTGGGCGCCGTTACGCAGATCGGGTTGCGCTTTCGTGCCCGTACCGTGGTGCTGACGGCTGGGACCTTCCTCAAC
GGTTTGATTCACGTGGGGCTGCAGAACTATTCCGGAGGGCGGGCAGGGGATCCTCCCGCCAATTCCCTGGGCCAGCGGCT
CAAGGAGCTGCAACTTCCGCAAGGCCGCCTGAAAACTGGCACGCCGCCGCGCATCGACGGACGCAGCATCAACTACAGTG
TGTTGGAAGAGCAGCCCGGCGATCTTGATCCCGTGCCGGTGTTCTCGTTCCTGGGCAAGGCCTCCATGCACCCGCGCCAG
CTGCCTTGCTGGATCACGCATACCAATGCCCGCACGCACGAAATCATCCGTGGCGGTCTGGACCGTTCGCCCATGTACAG
TGGGGTCATCGAAGGAGTGGGGCCTCGTTACTGCCCATCCATCGAGGACAAGATCCATCGTTTTGCGGACAAGGCATCGC
ACCAGGTATTCCTGGAACCGGAAGGCCTGAATACCCATGAGATCTATCCGAACGGTGTTTCCACCAGCCTGCCTTTCGAT
GTGCAGTACGAGTTGATCCATTCCCTGCCCGGACTGG
then i have the gff file
head gff
##gff-version 3
#!gff-spec-version 1.21
#!processor NCBI annotwriter
#!genome-build ASM19571v1
#!genome-build-accession NCBI_Assembly:GCF_000195715.1
##sequence-region NC_002929.2 1 4086189
##species https://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?id=257313
NC_002929.2 RefSeq region 1 4086189 . + . ID=id0;Dbxref=taxon:257313;Is_circular=true;Name=ANONYMOUS;gbkey=Src;genome=chromosome;mol_type=genomic DNA;old-name=Bordetella pertussis;strain=Tohama I
NC_002929.2 RefSeq gene 1 1920 . + . ID=gene0;Dbxref=GeneID:2664547;Name=gidA;gbkey=Gene;gene=gidA;gene_biotype=protein_coding;locus_tag=BP0001
NC_002929.2 RefSeq CDS 1 1920 . + 0 ID=cds0;Parent=gene0;Dbxref=Genbank:NP_878920.1,GeneID:2664547;Name=NP_878920.1;Note=GidA%3B glucose-inhibited cell division protein A%3B involved in the 5-carboxymethylaminomethyl modification (mnm(5)s(2)U) of the wobble uridine base in some tRNAs;gbkey=CDS;gene=gidA;product=tRNA uridine 5-carboxymethylaminomethyl modification protein;protein_id=NP_878920.1;transl_table=11
NC_002929.2 RefSeq sequence_feature 19 1893 . + .ID=id1;Dbxref=GeneID:2664547;Note=HMMPfam hit to PF01134%2C Glucose inhibited division protein A;gbkey=misc_feature;gene=gidA
NC_002929.2 RefSeq sequence_feature 820 864 . + .ID=id2;Dbxref=GeneID:2664547;Note=ScanRegExp hit to PS01280%2C Glucose inhibited division protein A family signature 1. Confirmed by InterPro eMOTIF pattern match.;gbkey=misc_feature;gene=gidA
i want to use the fasta file as a mapping file, but I need the to convert the fasta file to bed file first. I tried
bedtools getfasta -fi Fasta -bed gff -tab -fo Testing
it gives an ouput like this
NC_002929.2:2605-3403 ATGAAAAACATACCGCCCAGCAAGTCCGCCCGCGTGTTCTGCATCGCCAACCAGAAGGGCGGCGTCGGCAAGACCACCACCGCCATCAACCTTGCGGCTGGCCTGGCTACGCACAAGCAGCGGGTGCTGCTGGTCGATCTCGATCCGCAGGGCAACGCCACCATGGGCAGCGGCATCGACAAGAGTACGCTCGAATCCAACCTGTACCAGGTGCTCATCGGCGAGGCCGGTATCGAACAGACGCGCGTGCGTTCGGAGTCCGGCGGCTACGACGTATTGCCGGCCAACCGCGAACTGTCCGGCGCCGAGATCGACCTGGTGCAGATGGACGAGCGCGAGCGCCAGCTCAAGGCCGCCATCGACAAGATCGCCGGCGAATACGATTTCGTGCTGATCGATTGCCCGCCCACGCTGTCGCTGCTTACCCTTAACGGGCTGGCTGCCGCGCACGGCGTCATCATTCCGATGCAGTGCGAGTACTTTGCGCTCGAAGGCCTGTCCGACCTGGTAAACACCATCAAGCGCGTGCATCGCAATATCAACAACGAACTCCGTGTCATCGGTTTGTTGCGCGTGATGTTCGACCCGCGCATGACCTTGCAGCAGCAGGTGTCGGCCCAGCTCGAATCCCACTTCGGCGACAAGGTCTTCACCACGGTGGTGCCACGCAATGTGCGGTTGGCCGAGGCGCCCAGCTATGGCATGCCGGGCGTGGTGTATGACCGCGCGTCGCGCGGCGCGCAGGCCTATATTGCATTTGGCGCGGAAATGATAGAACGCGTCAAAGAGCTGGATTGA
NC_002929.2:2605-3403 ATGAAAAACATACCGCCCAGCAAGTCCGCCCGCGTGTTCTGCATCGCCAACCAGAAGGGCGGCGTCGGCAAGACCACCACCGCCATCAACCTTGCGGCTGGCCTGGCTACGCACAAGCAGCGGGTGCTGCTGGTCGATCTCGATCCGCAGGGCAACGCCACCATGGGCAGCGGCATCGACAAGAGTACGCTCGAATCCAACCTGTACCAGGTGCTCATCGGCGAGGCCGGTATCGAACAGACGCGCGTGCGTTCGGAGTCCGGCGGCTACGACGTATTGCCGGCCAACCGCGAACTGTCCGGCGCCGAGATCGACCTGGTGCAGATGGACGAGCGCGAGCGCCAGCTCAAGGCCGCCATCGACAAGATCGCCGGCGAATACGATTTCGTGCTGATCGATTGCCCGCCCACGCTGTCGCTGCTTACCCTTAACGGGCTGGCTGCCGCGCACGGCGTCATCATTCCGATGCAGTGCGAGTACTTTGCGCTCGAAGGCCTGTCCGACCTGGTAAACACCATCAAGCGCGTGCATCGCAATATCAACAACGAACTCCGTGTCATCGGTTTGTTGCGCGTGATGTTCGACCCGCGCATGACCTTGCAGCAGCAGGTGTCGGCCCAGCTCGAATCCCACTTCGGCGACAAGGTCTTCACCACGGTGGTGCCACGCAATGTGCGGTTGGCCGAGGCGCCCAGCTATGGCATGCCGGGCGTGGTGTATGACCGCGCGTCGCGCGGCGCGCAGGCCTATATTGCATTTGGCGCGGAAATGATAGAACGCGTCAAAGAGCTGGATTGA
it is exactly what I desire to start my analysis but I also need the gene names with the gene locations in the Testing file. Any help on how to do this?
Did you try using the
-name
option in the bedtools syntax? If that gives you the gene name in the first column you could add the starting and ending positions of the genes to the bedtools output by yourself with awk.