1)Is there a way to get the last lines (mRNA_1) added to the header as well?
my header is >chr1A:992-1945 and my wanted header is >chr1A:992-1945:mRNA_1
2) Is this command line going to take me the coordinates from each chromosome, because I have one fasta file containing all chromosomes.
I am also trying to use bedtools to extract CDS from a genome fasta. However, I don't undertstand what the bed file is? Do I have to modify my gff file to only contain the hits of CDS?
If you only need CDS then yes but it looks like @Dario's answer below is doing that already. Your gff file will need to have CDS entries in column 3 for that to work.
Seems to work, thanks! I have a Q, in the command line I used it give me headers like this i.e. for the 1st line (chr1A rnaseq CDS 992 1945 . + 0 mRNA_1) chr1A:991-1945 instead of 992, in your command line it is giving me chr1A:992-1945:mRNA_1. So it takes 1st nucleotide as 0, but in urs as 1, why is this difference? This is not so important as both cases are ok, I wanted just to understand.
1) Checkout the help for the command below. I think you can experiment with -name or -fullHeader to see if you get what you want.
bedtools getfasta -h
Tool: bedtools getfasta (aka fastaFromBed)
Version: v2.26.0
Summary: Extract DNA sequences from a fasta file based on feature coordinates.
Usage: bedtools getfasta [OPTIONS] -fi <fasta> -bed <bed/gff/vcf>
Options:
-fi Input FASTA file
-bed BED/GFF/VCF file of ranges to extract from -fi
-name Use the name field for the FASTA header
-split given BED12 fmt., extract and concatenate the sequencesfrom the BED "blocks" (e.g., exons)
-tab Write output in TAB delimited format.
- Default is FASTA format.
-s Force strandedness. If the feature occupies the antisense,
strand, the sequence will be reverse complemented.
- By default, strand information is ignored.
-fullHeader Use full fasta header.
- By default, only the word before the first space or tab is used.
2) Not sure what you are asking, but the coordinates, for example chr1A:992-1945 is specific to chr1A.
Hi,
I am also trying to use bedtools to extract CDS from a genome fasta. However, I don't undertstand what the bed file is? Do I have to modify my gff file to only contain the hits of CDS?
Do I have to modify my gff file to extract only CDS from my genome fasta? The bedtools getfasta manual is not informative to me.
If you only need CDS then yes but it looks like @Dario's answer below is doing that already. Your gff file will need to have CDS entries in column 3 for that to work.