Entering edit mode
5.1 years ago
mohammed2003w
•
0
Hello, I'm a beginner in this field and I have 4 paired-end reads of parasite genome, I would like to start to analyse them to draw a Phylogenetic tree and to find some virulence genes correlated to the disease. Could anyone help and guide me, please. I installed and start using ubuntu for this job. Thanks, Mohamed
I think you should start by understanding what exactly you can do and what you want to do with your data. To do that you have to learn the basics of data bioinformatics data analysis, one good entry in this field can be found here : https://www.biostarhandbook.com/ Then you can find help on the forum by asking more specific questions regarding the differents steps of your analysis (it's difficult to help you without more details)
Thank you guillaume.
I already started to read this book.
Regards, Mohamed
Ok great, typically to do phylogenetic trees you'll have to work with variants detected in each sample you have, and to have variants you'll have to do a variant calling on aligned reads (bam format).
Exactly this what I want to do, I have a set of 46 bam files (mapped files to a reference), so I need to go through of them and find the varients.
So I need to know the exact tools I have to use to do such as Multiple sequence alignments and other processes...etc
Thank you for your help and guide
Hi Mohamed,
To do phylogenetic tree you'll have to create a fasta file from your vcf file. In your fasta file you want to have a sequence for each isolate, made of only detected SNPs (all sequences must be the same length, with each SNPs at the same position in each sequence), for this I use the software PGDspider.
I advise you to filter for high quality variants, maybe try only with variants in coding regions, and to keep only SNPs and not indels.
Once you have a fasta file you can use it for generating a phylogenetic tree, you can do this with online tools, for example here : https://www.phylogeny.fr
(if you have other questions you should make a new post on the forum, you'll have other answers, and help people having the same issue)
OK.
Thank you very much for your help.
Sure I will create a new post for any further question.
Bioinformatics is really broad and there are many specialties. As some one already answered, the question is not specific enough. Maybe people can guide you for now by telling you the subject you need to look at.
I am not sure but for this question I think you need to know things about:
Yes but I need to know the tools for these steps. Doing an assembly, reference based or denovo Read something about genome annotation and orf prediction blast data formats fasta, fastq, sam Multiple sequence alingments Tool to create a Phylogenetic tree (not the theory and algoritms behind it)