Hi,
I have gtf files for a list of organisms. I want to extract only the intergenic sequences for each of their corresponding fasta files. Currently, I have been able to use grep -o '^..........' ecoli.fa > 10.txt
to grep out the first 10 characters if the first gene starts at pos 11.
I now need to find a way to automate the process for the remaining regions, as manual inspection is impossible. How can I do the same for a huge file, for example -
Gene Start End
Gene1 11 157
Gene2 2878 8548
Gene3 11254 12124
The process might require file handling to subtract End
of a row from the Start
of the next row, and that value could be coupled with grep
to give a command like grep -o '^{2877.}' ecoli.gtf | tail -f 2720 > 2720.txt
P.S. I could make the gtf file containing the intergenic lengths, but I do need help with the file handling.
Thanks in advance
Hello,
you could first create a
bed
file containing the coordinates for each intergenic regions with the language of your choice. Than you can use thisbed
file together with bedtools getfasta to get the sequences.fin swimmer
Thanks for the advice. It would certainly help, but to do the same, I first need to employ file handling to select the
Start
andEnd
coordinates of the annotated genes. Would you happen to have a code to create the bed file?Thanks.
How does your file with gene coordinates looks like? In your first post you've said you have a
gtf
file. But the example you have posted doesn't looks like one.fin swimmer
Sorry for the confusion on my part. I only showed the relevant columns, but it is a proper gtf file.
@ Pierre Lindenbaum , Devon Ryan , genomax , WouterDeCoster
What happens to that thread? Who closes it and why?
fin swimmer
Only mods
close
threads when they are off-topic/spam etc. Users should justaccept
answers that are correct. That provides proper closure for the thread.Hello,
I have a GFF3 file. How can I get the intergenic regions and its positive and negative chain information?
Thank.
This is an unrelated question and should have been posted as a new thread. Also provide clear objective of what you are trying to achieve.