Hello everyone, I need help to do this work. I need a workflow to to do this work. Just someone tell me what I have to do. In my views I have to do map all the reads to reference genome and then I have to annotate the mapped bam file with reference genome annotation file. And then I have to find the gene sequences. is it correct??? Anyone please tell me how I have to proceed
The work is;
First retrieve sequence data of four genotypes from SRA database. The links of SRA database for four genotypes under study are as follows;
Reference genome guided assembly and sequence retrieval of enlisted gene and their promoter (at least 500 bp promoter regions). The list of the selected genes is given in table below.
- For the analysis part, please use the ref. genome sequence of desi type Chickpea (Cicer arietinum) which is available in NCBI genome database as Cicer arietinum ASM33114v1 and Submitter: BGI-Shenzhen.
- NCBI link for the Ref. Genome Seq is https://www.ncbi.nlm.nih.gov/genome/?term=Cicer+arietinum .
- GFF file of reference genome is available in the above given link
List of gene for gene and promoter sequence retrieval
S.l. No.
Protein ID
Locus ID
Chromosome No.
Annotation
Chromosomal location in reference genome
1
XP_004490705
LOC101488582
Ca2
XP_004490705.1CBL-interacting serine/threonine-protein kinase 2-like [Cicer arietinum]
27180279..27182763
2
XP_004490512
LOC101500060
Ca2
XP_004490512.1protein ENHANCED DISEASE RESISTANCE 4-like [Cicer arietinum] ;
24288706..24293940, complement
3
XP_027188015
LOC101502928
Ca2
XP_027188015.1WRKY transcription factor 55 [Cicer arietinum]
24127236..24131112
4
XP_004490681
LOC101505077
Ca2
XP_004490681.1WRKY transcription factor WRKY24 [Cicer arietinum]
26832049..26835822, complement
5
XP_004490530
LOC101507148
Ca2
XP_004490530.1membrane protein of ER body-like protein isoform X2 [Cicer arietinum]
24678555..24685743, complement
6
XP_012570782
LOC101508871
Ca4
XP_027189822.1 ethylene-responsive transcription factor ERF110-like isoform X4 [Cicer arietinum]
45735671..45738692, complement
7
XP_004495105
LOC101502737
Ca4
XP_004495105.1ethylene-responsive transcription factor ERF098-like [Cicer arietinum]
1228052..1228746
8
XP_004497839
LOC101496212
Ca4
XP_004497839.1ethylene-responsive transcription factor 1B-like [Cicer arietinum]
28202121..28203066
9
XP_004504077
LOC101508254
Ca6
XP_004504077.1calmodulin-binding transcription activator 1
8086719..8093508
10
XP_004504403
LOC101505145
Ca6
XP_004504403.1calmodulin-binding transcription activator 4 isoform X1
10864037..10871997
While that can work it appears that you are being asked to do a reference guided genome assembly?
There are some options for that in Reference guided assembly methods and Reference Guided De Novo assembly of Contigs generated from Illumina PE Reads.
This paper will also be of interest: https://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-017-1911-6
You opened a new post that has a large overlap with this question. Why did you do that?
I have deleted the other post owing to this overlap.
pls tell me, i have mapped reads and got the bam file. Now what should i do next?
You opened a new post that has a large overlap with this question. Why did you do that?
because in that post i asked a different question
Thank you. In that case, I'll reopen that post.