Artemisia Database

A Comprehensive Resource for Artemisia annua Genomics & Transcriptomics

Key Features

Gene Expression

Visualize expression patterns across tissues

Explore Now

Genome Browser

Explore genome annotations interactively

Explore Now

BLAST Search

Map sequences to Artemisia genes

Explore Now

Functional Annotation

Gene function predictions and annotations

Explore Now

About the Database

The Artemisia Database is a comprehensive platform dedicated to Artemisia annua , the primary source of artemisinin—a vital antimalarial compound. This resource integrates genomic, transcriptomic, and functional annotation data to support research in plant biology and specialized metabolism.

What You Can Do:

Gene Expression: Explore tissue-specific patterns
TF Analysis: Investigate transcription factors
Functional Search: Find genes by annotation
CRISPR Insights: Design gene editing targets

Genome Browser: Visualize genomic features
BLAST: Map sequences to Artemisia genes
Data Download: Access raw datasets
Tutorials: Step-by-step guides

Navigate using the tabs above or click on any feature card to start exploring.

Database Highlights

30K+

Genes Annotated

900+

Metabolites Analyzed

300+

RNA-seq Samples

3K+

Transcription Factors

Global Expression Viewer Information

Purpose: Interactive t-SNE visualization of sample relationships based on gene expression patterns.

Key Features:

Interactive Plot: t-SNE visualization colored by tissue type
Click Filtering: Click any dot to filter metadata for that specific sample
PubMed Integration: Direct links to associated publications
Full Exploration: Zoom, pan, and hover for detailed information

How to use:

Explore: Zoom and pan (drag) to examine sample clusters
Identify: Hover over any point to see the sample ID
Click to Filter: Click any dot on the plot to filter the table below. The metadata table will update to show only the selected sample's information.
Access Publications: In the filtered table, if a PubMed ID is available, click the link to open the associated publication in a new tab.
Reset: To view all samples again, refresh the page

Interpreting the Visualization:

Closer points indicate more similar gene expression profiles
Color groupings reveal tissue-specific expression patterns
Tight clusters suggest biological reproducibility
Outliers may represent unique conditions or technical variations
Separation between colors shows differential expression across tissues

Data in the Table Below:

Initially shows metadata for all samples
Filters to single sample when you click a point
Contains: Sample ID, Tissue Type, Project Details
PubMed Links: Blue hyperlinks open the publication in PubMed

Pro Tip: Use this tool to:

Find samples for comparative analysis
Quickly access publication details for specific datasets
Select samples for downstream analysis in other tabs

t-SNE Colored by Body

Filtered t-SNE Metadata

Median Expression Calculator

Purpose: Calculate and visualize median expression values (TPM) for custom gene lists across selected tissues.

Input Options:

Manual entry: Paste gene IDs (one per line, max 100)
Examples: Use the example buttons for pre-defined gene sets
File upload: Upload .txt or .csv file with gene IDs
Tissue selection: Check which tissues to include in analysis

Output:

Heatmap showing log2(TPM+1) expression values
Genes on Y-axis, tissues on X-axis
Interactive: hover for exact values
Download as PNG by clicking on the camera icon on the top right of the plot

Tip: Copy gene IDs from other tabs or use the example buttons to get started quickly.

Enter Gene IDs (one per line, max 100):

Example Gene IDs:

Or Upload Gene IDs File:

Browse...

Select Tissue(s):

Root Stem Leaf Flower Petiole Shoot

Download Expression Data (CSV)

Downloads CSV with both raw TPM and log2-transformed values

Artemisinin Pathway Explorer

Purpose: Specialized tool for exploring genes involved in artemisinin biosynthesis.

Key Features:

Search: Find genes by ID or name in the artemisinin pathway
Browse: View all artemisinin-related genes in the main table
Analyze: Check tissue-specific expression patterns
Access: Download transcript sequences
Export: Download expression data as CSV

Step-by-Step Workflow:

Search or Browse: Use the text area to search by Gene ID or Gene Name, or browse the complete list in the Main Table
Select Genes: Select the row(s) of interest in the Main Table
Check Expression: Click the 'Check Gene Expression' button that appears below the table
- This automatically switches to the Expression tab
- Selected genes will be pre-loaded
- Select tissues and click 'Calculate Median Expression'
- View heatmap showing expression across tissues
Export Data: After generating the heatmap, use the download buttons to:
- Download Plot: Save the heatmap as PNG, JPEG, or PDF
- Download Data (CSV): Export raw TPM and log-transformed values
Access Sequences: Switch to the Sequence tab to:
- View detailed transcript information
- Download sequences using the 'Download Sequences (Text)' button

Note: This tool focuses specifically on genes with known or predicted roles in artemisinin biosynthesis.

Artemisinin Pathway Genes

This tab provides detailed information about genes related to artemisinin, a vital compound used in the treatment of malaria. You can explore gene classifications, view expression data, and access sequences for selected genes.

Use the Main Table to filter and select genes of interest. The Expression tab allows you to analyze gene expression across various plant tissues, while the Sequence tab provides detailed transcript information.

Enter Gene IDs (one per line, max 100):

Select Parts:

Root Stem Leaf Flower Petiole Shoot

Download Plot

Download Data (CSV)

Plot Format:

CSV includes: Raw TPM + log2(TPM+1) values

Gene Details

Detailed information about the selected gene is displayed below.

Transcripts and Sequences

Download Sequences (Text)

Gene Category Explorer

Purpose: Explore genes classified by tissue-specific expression using Tau index.

Key Features:

Tau Index: Measures tissue specificity (0 = ubiquitous, 1 = tissue-specific)
Donut Chart: Visual overview of gene categories
Gene Search: Find specific genes by ID
Multi-tab View: Table, expression, and sequence data
Export Data: Download expression heatmap data as CSV

Workflow for Tissue-Specific Genes:

Filter table by tissue of interest
Select tissue-specific genes
Click 'Check Expression' button
View expression patterns in heatmap
Download data using the CSV download button
Get sequences in Sequence tab

Use this to identify genes specifically expressed in roots, stems, leaves, flowers, or petioles.

Explore genes by classification using the donut plot or search by gene IDs below.

Search by Gene IDs (one per line, max 100):

Details

Enter Gene IDs (one per line, max 100):

Select Parts:

Root Stem Leaf Flower Petiole Shoot

Download Plot

Download Data (CSV)

Plot Format:

CSV includes: Raw TPM + log2(TPM+1) values

Gene Details

Detailed information about the selected gene is displayed below.

Transcripts and Sequences

Download Sequences (Text)

PlantTFDB Transcription Factors

Interactive Bar Chart: Shows TF family distributions
Three Ways to Explore:
- Click any bar to filter genes by family
- Search specific genes by ID
- Browse all TFs in the table
Select genes on the Main Table
Click 'Check Gene Expression' to view tissue-specific patterns
Download: CSV, sequences, or expression plots and data

Based on PlantTFDB classification of transcription factor families

Enter Gene List (max 100 gene IDs per line):

Details

Download CSV

Enter Gene IDs (one per line, max 100):

Select Parts:

Root Stem Leaf Flower Petiole Shoot

Download Plot

Download Data (CSV)

Plot Format:

CSV includes: Raw TPM + log2(TPM+1) values

Gene Details

Detailed information about the selected gene is displayed below.

Transcripts and Sequences

Download Sequences (Text)

Pfam Transcription Factors

Interactive Bar Chart: TF families based on Pfam domains
Explore: Click bars, search genes, or browse table
Select genes from table using
Check Expression: Click button to see tissue patterns
Pfam Links: Click Pfam IDs to open InterPro entries
Download: CSV, sequences, expression plots and data

Transcription factors identified by conserved protein domains

Enter Gene List (max 100 gene IDs per line):

Filtered Data

Download CSV

Enter Gene IDs (one per line, max 100):

Select Parts:

Root Stem Leaf Flower Petiole Shoot

Download Plot

Download Data (CSV)

Plot Format:

CSV includes: Raw TPM + log2(TPM+1) values

Gene Details

Detailed information about the selected gene is displayed below.

Transcripts and Sequences

Download Sequences (Text)

Tissue-Specific Transcription Factors

Heatmap: Shows TF families enriched in specific tissues
Click any cell to filter genes for that tissue/family combination
Select genes from filtered table using checkboxes
Check Expression: Button appears after selection
Three Tabs:
- Main Table: Browse and select genes
- Expression: Tissue-specific patterns
- Sequence: Transcript details
Download: CSV, sequences, expression plots and data

Identify TFs specifically expressed in roots, stems, leaves, flowers, or petioles

Details

Download CSV

Enter Gene IDs (one per line, max 100):

Select Parts:

Root Stem Leaf Flower Petiole Shoot

Download Plot

Download Data (CSV)

Plot Format:

CSV includes: Raw TPM + log2(TPM+1) values

Gene Details

Detailed information about the selected gene is displayed below.

Transcripts and Sequences

Download Sequences (Text)

Multi-Omics Integration: Gene-Metabolite Correlation Analysis

Discover functional relationships between gene expression and metabolite profiles

Analysis Controls

Select Metabolite

Examples: Artemisinin, Artemisinic acid, DihydroArtemisinic acid, Putrescine

Analysis Mode

Discovery Mode (Top 30 correlated genes)

Hypothesis Mode (Test your genes)

Enter Gene IDs (one per line):

Maximum 50 genes

Correlation Filter

Minimum |r| value:

0.7 0.8 1.0

Only gene-metabolite pairs with |r| ≥ threshold are shown

Processing may take a few seconds

Data Information

4 Experimental Conditions:

WT_YL: Wild-type Young Leaf
WT_ML: Wild-type Mature Leaf
MUT_YL: Mutant (tdd1) Young Leaf
MUT_ML: Mutant (tdd1) Mature Leaf

Link to the paper

Gene-Metabolite Correlation Results

Filtered correlations based on your selection. Click any row to view detailed trend analysis.

Show

entries

Download CSV

Selected Pair Details

Correlation Matrix Visualization

Heatmap showing correlation values. Red = positive correlation, Blue = negative correlation.

Number of genes:

Tip: Hover for details, click camera icon to download

Interpreting the Heatmap:

Bright red: Strong positive correlation (r > 0.8)
Light red: Moderate positive correlation (0.5 < r < 0.8)
Light blue: Moderate negative correlation (-0.8 < r < -0.5)
Bright blue: Strong negative correlation (r < -0.8)

Dual-Axis Trend Analysis

Compare metabolite abundance (bars) with gene expression (line) across the 4 experimental conditions.

Select a Gene-Metabolite Pair

To view trend analysis, first go to the Correlation Table tab and click on any row to select a gene-metabolite pair.

1. Go to 'Correlation Table' tab 2. Click any row 3. View trends here

Tip: The most interesting pairs usually have strong correlations (|r| > 0.8)

Tip: Hover for values, click camera icon to download

Scientific Interpretation:

Co-regulation: When bars (metabolite) and line (gene) follow similar patterns, this suggests the gene may be involved in producing or regulating the metabolite
Mutation effect (tdd1): Compare WT vs MUT conditions - disruption in both layers suggests the mutation affects this pathway
Development effect: Compare Young vs Mature leaves - changes may indicate developmental regulation
Inverse relationship: Opposite patterns may indicate regulatory or catabolic relationships

Gene Functional Context

Biological information for genes identified in correlation analysis

Select gene for detailed view:

Functional Annotation Explorer

Three Query Types:

Annotation Table: Search by GO, Pfam, InterPro, KEGG, or Arabidopsis IDs
Gene-Based: Find all annotations for specific gene IDs
Functional Descriptions: Search terms in annotation descriptions

Quick Start:

Select query type tab
Enter IDs or search terms
Click search button
View results in separate tabs
Download CSV files

💡 Tips:

Examples auto-fill when selecting annotation type
Maximum 100 IDs per query
Results show in separate tabs per database
Download buttons at bottom of each results tab

Functional Annotation Queries

Select Annotation Table:

Enter Annotation ID(s):

Enter Gene ID(s):

Enter Functional Term:

Query Results

CRISPR sgRNA Design Tool

sgRNA design using crisprDesign methodology

Key Features:

Multiple CRISPR nucleases (SpCas9, AsCas12a, SaCas9, enCas12a)
CDS-aware targeting for precision knockout

Design Parameters

Target Gene ID(s):

Target Region:

CDS (coding sequence)

Gene ± 500bp

Custom region

Upstream (bp):

Downstream (bp):

CRISPR Nuclease:

Design Filters:

GC content 20-80%

Remove poly-T stretches

Avoid common restriction sites

Design Summary

How to Use This Tool

Enter Gene IDs: Use gene IDs from the Mikado annotation (e.g., mikado.chr1G1335)
Select Target Region:
- CDS: Target only coding sequences for precise knockout
- Gene ± 500bp: Include promoter and terminator regions
- Custom: Specify exact upstream/downstream distances
Choose Nuclease:
- SpCas9: Most common, requires NGG PAM
- AsCas12a/enCas12a: Requires TTTV PAM, creates staggered cuts
- SaCas9: Smaller protein, requires NNGRRT PAM
Apply Filters: Use filters to improve guide quality
Click 'Design sgRNAs': Generate and score guide RNAs

Output File

CSV: Complete design data in spreadsheet format

Ready to Design sgRNAs

Follow these steps to design CRISPR guide RNAs:

Enter Gene IDs

Enter one gene ID per line from the Mikado annotation

Select Parameters

Choose target region, nuclease, and quality filters

Design Guides

Click 'Design sgRNAs' to generate and score guide RNAs

Example Gene IDs:

mikado.chr1G1335
mikado.chr1G1016
mikado.chr2G2024

Check the 'Browse' tab for a complete list of available genes.

Status Check:

Make sure the genome and annotation are loaded (see status above).

If they show as 'not loaded' or 'failed', use the Force Load buttons.

Epigenetic Regulation Explorer

Purpose: Explore genes involved in epigenetic regulation through DNA and RNA methylation, and histone modifications.

Three Methylation Types:

N6-methyladenosine (m6A): RNA methylation (Writer/Eraser/Reader)
5-methyladenosine: DNA methylation enzymes
Histone-H3: Histone modification proteins

Two Ways to Explore:

Filter by methylation type and specific functions
Search specific gene IDs (max 100)

Workflow:

Select methylation type and filter (if needed)
Or search specific gene IDs
Select genes from table
Click 'Check Gene Expression' to see tissue patterns
Download data and plots from Expression tab
Get sequences in Sequence tab

All expression data includes both raw TPM and log2-transformed values for download

Methylation

Select Methylation Type:

Search by Gene ID(s)

Enter Gene List (max 100 gene IDs per line):

Details

Download CSV

Enter Gene IDs (one per line, max 100):

Select Parts:

Root Stem Leaf Flower Petiole Shoot

Download Plot

Download Data (CSV)

Plot Format:

CSV includes: Raw TPM + log2(TPM+1) values

Gene Details

Detailed information about the selected gene is displayed below.

Transcripts and Sequences

Download Sequences (Text)

JBrowse Genome Browser

Purpose: Visualize genomic data for Artemisia annua using JBrowse, based on the latest annotation and sequence files developed in this study.

Step-by-Step Instructions:

Click the Open button to access the genome browser
Click Open Track Selector in the sidebar to view available tracks
Select the Artemisia Annotation (GFF3) and Genome Sequence (FASTA) files to load them into the browser
Zoom in on a region of the genome to view gene features (exons, introns, UTRs)
Click on any feature (gene or CDS) to open a detailed information window
In the feature window:

Click Show Feature Sequence to display and copy the sequence
Use the dropdown menu to extract sequences with 1000 bp upstream/downstream
Click the wheel icon to modify the extraction range (e.g., 500 bp or 2000 bp)

Example Workflow:

After opening JBrowse and loading tracks, zoom into a region to see gene features from the GFF3 file. Select a gene feature to open its details window, then click 'Show Feature Sequence' to view its sequence. From the dropdown menu, extract 1000 bp upstream and downstream, and adjust to 1500 bp using the wheel icon.

Navigation Tips:

Use mouse wheel to zoom in/out
Click and drag to pan across the genome
Right-click for additional navigation options
Scroll horizontally to explore different genomic regions

Gene ID Conversion & Discovery

Purpose: Map unknown sequences to our new Mikado gene IDs, or find corresponding genes between different annotation systems.

Sequence Input Requirements:

BLASTN (Nucleotide): DNA/RNA sequences in FASTA format
TBLASTN (Protein): Amino acid sequences in FASTA format
Maximum file size: 2MB
Database: Artemisia annua transcriptome (nucleotide)

Important Notes:

BLASTN: For nucleotide sequences (DNA/RNA)
TBLASTN: For protein sequences (translates database in 6 frames)
FASTA format required: Sequences must start with '>' header
Results: Always returns Mikado gene IDs in 'sseqid' column

Typical Workflow:

Start with BLAST to identify matching gene IDs
Copy identified mikado.chrXGXXXX IDs
Use IDs in other database sections:

Check expression patterns across tissues
Find functional annotations (GO, Pfam, etc.)
Explore transcription factor classifications

How to Use BLAST:

Select BLAST program type based on your input
Paste sequences (FASTA format)
Or upload sequence file (≤2MB)
Adjust parameters if needed
Click 'Run BLAST'
Download results as CSV

BLAST Search

Select BLAST Program:

BLASTN (Nucleotide sequences)

TBLASTN (Protein sequences)

Input Sequences (FASTA format):

Or Upload Sequence File (FASTA format, max 2MB):

Browse...

Advanced Parameters (Optional)

Minimum Identity (%):

E-value Threshold:

Maximum Hits:

Scoring Matrix:

Results

Download Results as CSV

Co-expression Network Analysis

Purpose: Identify genes co-expressed with your query genes across Artemisia RNA-seq samples to discover functionally related genes and pathways.

Input & Parameters:

Query Genes: Up to 10 gene IDs (newline separated)
Filters: Bioproject, tissue type, FDR, correlation thresholds
Method: Pearson or Spearman correlation

Analysis Workflow:

Enter query gene IDs (max 10)
Filter by dataset (Bioproject) and tissues
Set statistical thresholds (FDR ≤ 0.05, correlation ≥ 0.7)
Click 'Run Co-expression'
Explore results in three tabs:

Results Tabs:

Table: Co-expressed genes with correlations, p-values, FDR
Network Graph: Interactive visualization of gene relationships
GO Enrichment: Functional analysis of co-expressed gene sets

Key Features:

Network visualization shows input genes (diamonds) and co-expressed genes (circles)
Circle colors indicate how many input genes connect to each target
GO enrichment helps identify biological processes for co-expressed sets
All results downloadable as CSV files
Based on log-transformed TPM expression values

Use this to discover: gene regulatory networks, pathway members, functionally related genes, and potential transcription targets.

Enter Gene List (max 10 gene IDs per line):

Correlation Method:

Filter by Bioproject:

Select Plant Tissue(s):

FDR Rate Threshold:

Minimum Absolute Correlation:

Metadata Table

Co-expression Results

Download Results

Filter by Max P.adjust:

Download

Gene Ontology (GO) Enrichment Analysis

Purpose: Identify overrepresented biological processes, molecular functions, and cellular components in your gene list.

Input Options:

Paste genes: Enter gene IDs (one per line)
Upload file: .txt file with one gene per line

Analysis Steps:

Enter gene list (or upload file)
Click 'Run GO Enrichment' button
View dot plot visualization
Filter results by root node and p-value
Download complete results table

Results Include:

Dot Plot: Visual summary of enriched GO terms
Filtering: By root node (BP, MF, CC) and adjusted p-value
Table: Detailed results with gene counts and p-values
Download: Full results as CSV

Key Features:

Root node filtering: Biological Process (BP), Molecular Function (MF), Cellular Component (CC)
FDR-corrected p-values (Benjamini-Hochberg method)
Interactive dot plot with hover information
Gene lists for each GO term included in results
Works with any gene list (from BLAST, co-expression, etc.)

Typical Uses:

Interpret results from co-expression analysis
Understand functional themes in differentially expressed genes
Characterize gene sets from clustering or network analysis
Generate hypotheses about gene functions

Enter Genes (one per line):

Or Upload a Gene List File (txt, one gene per line)

Browse...

Analysis Status

Filter by Max P.adjust:

Download Results Table

Sequence Download

Purpose: Get FASTA sequences for gene IDs.

Input: Enter gene IDs (one per line)

Limits: 1-5 = view & download, 6+ = download only

Output: FASTA file with transcript sequences

Use gene IDs from other sections. All transcripts included. Ready for BLAST or primer design.

Download Sequences by Gene ID

Note: If you enter more than 5 gene IDs, sequences will not be displayed on screen—you’ll only be able to download them. If you enter 5 or fewer, you can both view and download.

Enter gene IDs (one per line):

Download Sequences

Download Data by Tissue

Select Criteria

Select Tissue(s) (multiple selections allowed):

Flower Leaf Petiole Root Stem Shoot

Select Quantification Measure:

Select Quantification Level:

Summary

Download Data by Project

Download Options

Select Data Type:

Download Files

Annotation File

Transcriptome Assembly

Genome Files

Multiple chromosome-level genome assemblies from Artemisia annua are available:

Artemisia annua LQ-9 phase0 genome (This version is used to build this database)

Artemisia annua LQ-9 phase1 genome

Artemisia annua HAN1 phase0 genome

Artemisia annua HAN1 phase1 genome

Artemisia Database Documentation

About the Artemisia Database

Welcome

Welcome to the Artemisia Database , a comprehensive platform designed to facilitate research on Artemisia annua gene expression, transcription factors, functional annotations, and more. This tutorial will walk you through the database's key features, showing you how to navigate its interface and utilize its tools effectively. Whether you're a biologist, bioinformatician, or researcher, this guide will help you get started.

The Artemisia Database was built as part of the postdoctoral project of Dr. Ayat Taheri at the School of Agriculture and Biology, Shanghai Jiao Tong University .

This resource aims to provide valuable insights into Artemisia gene expression, specialized metabolism, and related biological processes.

Development Workflow

Below is the comprehensive workflow used to develop the Artemisia Database, from data acquisition to interactive visualization:

Acknowledgment

We acknowledge the contributions of the researchers and organizations who generated the genome and RNA-sequencing data utilized in this study.

This work was supported by:

The Natural Science Foundation of China (82274047)
The Bill & Melinda Gates Foundation (INV-027291)

The computations in this research were run on the Siyuan-1 cluster supported by the Center for High-Performance Computing at Shanghai Jiao Tong University .

We acknowledge the invaluable support of Professor Kexuan Tang , throughout this project.

Future Plans

We plan to regularly include additional RNA-seq datasets to expand the utility of the database.

This ongoing development aims to make the database a comprehensive resource for researchers studying Artemisia and its specialized metabolism.

Code & Contact

Source Code

This database is built with R Shiny and all code is open-source:

GitHub Repository

Contact

For inquiries, please contact:

Dr. Ayat Taheri

Email: [email protected]

Collaboration Invitation

Interested in collaborating? I'm open to research partnerships, data sharing, and joint publications.

User Tutorial

Tutorial: Exploring the Artemisia Database

Welcome to the Artemisia Database, a comprehensive platform designed to facilitate research on Artemisia annua gene expression, transcription factors, functional annotations, and more. This tutorial will walk you through the database’s key features, showing you how to navigate its interface and utilize its tools effectively. Whether you’re a biologist, bioinformatician, or researcher, this guide will help you get started.

Getting Started
- Accessing the Database
Home Page Overview
Gene Expression
Transcription Factors
Gene-Metabolite Correlation Analysis
Functional Annotation
Gene Editing and Epigenetics
- CRISPR sgRNA Designer
- Methylation
Tools
Download

Getting Started

Accessing the Database

Navigate to the Portal: Open your web browser and go to the Artemisia Database.
Interactive Interface: The platform is built using R Shiny, providing a highly responsive and intuitive environment for data exploration.
Guidance & Documentation: At the top of every page, you will find a dropdown menu. Use this menu to access detailed information regarding the section’s purpose and specific step-by-step instructions.

Home Page Overview

Upon accessing the database, you will land on the Home tab, which serves as the central hub for the platform:

Welcome & Overview: This section provides a high-level introduction to the Artemisia-DB project. It outlines the platform’s core objective: facilitating the exploration and functional analysis of Artemisia annua gene expression and multi-omics data.
Getting Started: Use this page to familiarize yourself with the database’s scope before diving into the specific analysis modules.

Gene Expression

The Gene Expression menu contains four powerful tools for visualizing expression data.

Global Expression Viewer

Purpose

The Global Expression Viewer allows you to visualize high-dimensional gene expression data through an interactive t-SNE plot. This tool is designed to help you identify expression clusters across different tissues and access the underlying metadata for each sample.

Step-by-Step Instructions

Navigate: Go to Gene Expression > Global Expression Viewer in the main menu.
Explore the Plot: Interact with the t-SNE plot (powered by Plotly) to see how samples cluster.
Inspect Metadata: Review the Filtered t-SNE Metadata table below the plot for comprehensive sample details.

Key Interactive Features

Dynamic Hover: Hover over any point in the t-SNE plot to identify the specific tissue type (e.g., “Leaf,” “Root,” or “Trichome”).
Bidirectional Filtering: Click on a point in the plot to instantly filter the metadata table to that specific sample.
Literature Links: If a sample originates from a published study, the Pubmed_ID column will provide a direct hyperlink to the research paper on PubMed.

Visual Overview

Figure 1: The t-SNE plot displays gene expression data colored by plant tissue. Interactive tools allow for zooming and point-specific identification.

Figure 2: The metadata table dynamically updates based on your plot selections, offering deep-dives into sample attributes and publication IDs.

Median Expression Per Tissue

Purpose

This tool allows you to aggregate and visualize the median expression levels of specific genes across various plant tissues. It is particularly useful for comparing the expression profiles of gene families or sets of co-expressed genes.

Step-by-Step Instructions

Navigate: Go to Gene Expression > Median Expression Per Part.
Input Gene IDs: * Enter up to 100 Gene IDs (one per line) into the text area.
- Alternatively, upload a .txt or .csv file containing your IDs.
- Quick Start: Use the Example Gene ID buttons to automatically populate the field with a sample set.
Select Tissues: Use the checkboxes to choose the specific plant parts (e.g., “Root”, “Leaf”, “Trichome”) you wish to include in the analysis.
Process Data: Click the Calculate Median Expression button.
Analyze & Save: * Review the generated heatmap, which displays log2-transformed expression values.
- To save the visualization, click the Camera Icon in the top-right corner of the plot to download it as a PNG.

Example Workflow

To test the tool, click Example#1 to load IDs like mikado.chr7G1274 and mikado.chr4G1337. Select Leaf and Flower, then click Calculate. The resulting heatmap will provide a direct visual comparison of these genes across the two tissues.

Visual Guidance

Example Gene IDs Figure 3: Use the example buttons for a quick setup, then refine your analysis by selecting specific plant parts.

Median Expression Heatmap Figure 4: The interactive heatmap visualizes relative expression levels. Hover over cells to see exact log2 values.

Artemisinin Pathway Genes

Purpose

This module provides a focused environment for exploring genes specifically linked to artemisinin biosynthesis. It integrates functional data, expression profiles, and sequence information through a centralized, multi-tab interface.

Step-by-Step Instructions

1. Access the Module

Navigate to Gene Expression > Artemisinin Pathway Genes. The “Main Table” tab will load by default, listing all relevant pathway genes.

2. Search and Filter

Use the search panel to isolate specific genes of interest:

By Gene ID: Enter IDs (one per line) and click Search by Gene ID.
By Gene Name: Enter keywords (e.g., ADS, CYP71AV1) and click Search by Gene Name.

3. Select and Interact

Once you have located your target genes, click on their rows in the Main Table to highlight them. This enables the action buttons at the bottom left:

Copy Gene ID(s): Copies the selected IDs to your clipboard for use in other modules (like the Median Expression Viewer).
Check Gene Expression: Automatically switches you to the Expression tab and carries over your selected IDs.

4. Analyze Expression & Sequences

Expression Tab: Select your desired plant parts (e.g., Leaf, Root, Trichome) and click Calculate Median Expression to generate a custom heatmap.
Sequence Tab: View the nucleotide or protein sequences for your selected genes. You can download these directly as a .txt file for local use.

Workflow Overview

Feature	Description
Main Table	Provides functional annotations and literature references.
Expression	Generates dynamic heatmaps based on tissue-specific data.
Sequence	Offers high-throughput access to FASTA-formatted sequences.

Overall Category Distribution

Purpose

This module provides a high-level classification of the Artemisia annua transcriptome. Using an interactive donut chart, you can explore gene categories (such as “Tissue-Specific” vs. “Broadly Expressed”) and drill down into the functional data and sequences associated with each group.

Step-by-Step Instructions

Navigate: Go to Gene Expression > Overall Category Distribution.
Explore the Donut Chart: * The chart displays the proportion of genes within different expression categories.
- Click a segment (e.g., “Specific”) to instantly filter the data table below to show only genes belonging to that category.
Search Specific Genes: Alternatively, enter one or more Gene IDs in the search box to identify their specific classifications and view their details in the table.
Select & Action: Click on one or more rows in the Main Table to enable the following tools at the bottom left:
- Copy Gene ID(s): Saves the selected IDs to your clipboard.
- Check Gene Expression: Redirects you to the Expression tab with your selected genes pre-loaded.
Analyze & Download:
- Expression Tab: Select plant parts and click Calculate Median Expression to visualize a heatmap of your selected category.
- Sequence Tab: View and download the nucleotide or protein sequences for your selected genes.

Visual Guidance

Donut Plot Figure 5: The interactive donut chart allows you to filter the entire dataset by clicking on specific expression categories, such as tissue-specific or constitutive genes.

Pro-Tip

This module is the fastest way to identify “specialist” genes. By clicking the Specific segment and then using the Check Gene Expression button, you can quickly verify which tissue (e.g., Trichome or Root) those genes are primarily active in.

Transcription Factors

The Transcription Factors menu provides specialized tools for identifying and analyzing regulatory proteins within the Artemisia annua genome, primarily categorized by the PlantTFDB framework.

PlantTFDB Module

Purpose

This tool allows you to explore transcription factor (TF) families, visualize their distribution, and analyze their tissue-specific expression patterns.

Step-by-Step Instructions

Navigate: Go to Transcription Factors > PlantTFDB.
Filter by Family: * Interact with the TF Family Distribution Plot.
- Click a bar (e.g., “bZIP”) to instantly filter the “Main Table” to that specific family.
Search & Select:
- In the Main Table, search for specific Gene IDs or browse the filtered list.
- Select rows to activate the action buttons at the bottom left.
Execute Actions:
- Copy Gene ID(s): Copies your selection to the clipboard.
- Check Gene Expression: Seamlessly transfers your selected TFs to the Expression tab.
Analyze Results:
- Expression Tab: Select target tissues (e.g., “Root”, “Leaf”) and click Calculate Median Expression to generate a heatmap.
- Sequence Tab: View and download the transcript sequences for your selected TFs.

Example Workflow: Analyzing bZIP TFs

To visualize the expression of bZIP transcription factors:

Click the bZIP bar in the distribution plot.
Select two or more genes from the Main Table.
Click Check Gene Expression.
On the Expression tab, select Leaf and Stem, then click Calculate.
Use the Camera Icon or the Download Plot button to save your heatmap as a PNG, JPEG, or PDF.

Visual Guidance

TF Family Bar Plot (PlantTFDB) Figure 5: Interactive bar plot showing TF family distribution. Clicking “bZIP” filters the dataset automatically.

Filtered TF Table Figure 6: The Main Table updates based on your plot selection. Use the buttons at the bottom left to transition to expression analysis.

Expression Heatmap Figure 7: Generate and export high-resolution heatmaps for your selected transcription factors across various plant parts.

Pfam

Purpose

The Pfam module allows you to explore transcription factors through the lens of conserved protein domains. This approach is ideal for identifying regulatory genes that share specific structural motifs—such as Zinc fingers or Leucine zippers—even if they are not fully categorized into traditional plant TF families.

Step-by-Step Instructions

Navigate: Go to Transcription Factors > Pfam.
Filter by Domain: View the Pfam Domain Distribution plot. Click on any bar (e.g., “PF00096”) to filter the Main Table below to show only genes containing that specific domain.
Detailed Exploration: Within the Details box, utilize the tabs to analyze your selection:
- Main Table: Search and select specific Gene IDs. Highlight rows to enable the Check Gene Expression or Copy Gene ID(s) buttons at the bottom left.
- Expression: Select target tissues and click Calculate Median Expression to generate a heatmap for the domain-specific genes.
- Sequence: View or download transcript sequences for your selected genes as a .txt file.

Tissue-specific TFs

Purpose

This module is designed to pinpoint transcription factors that exhibit localized activity. By focusing on genes with high expression in specific tissues (e.g., Leaf, Root, or Trichome), you can identify the primary regulators governing tissue-specific development and specialized metabolism.

Step-by-Step Instructions

Navigate: Go to Transcription Factors > Tissue-specific TFs.
Interact with the Heatmap: The overview heatmap displays TF types across different tissues. Click a specific cell (e.g., the intersection of CO-like TF and Leaf) to instantly filter the Main Table in the “Details” box below.
Select & Analyze:
- Main Table: Highlight the rows of the TFs you wish to investigate.
- Check Gene Expression: Click this button to carry your selected genes over to the Expression tab automatically.
Validate Specificity: * On the Expression tab, select a variety of plant parts and click Calculate Median Expression.
- This allows you to visually confirm whether the TF is exclusively expressed in your tissue of interest.

Example Workflow

To investigate stem-specific regulators, click the ERF TF cell in the Stem row of the overview heatmap. Select the resulting genes in the Main Table and click Check Gene Expression. By selecting both Leaf and Root on the expression tab, you can generate a heatmap that clearly demonstrates the gene’s preferential activity in leaf tissue.

Visual Guidance

Tissue-specific TF Heatmap
Figure 8: The interactive heatmap allows for rapid filtering. Clicking a cell instantly isolates the corresponding transcription factors in the data table below.

Gene-Metabolite Correlation Analysis

Purpose

The Gene-Metabolite Correlation Analysis module is a powerful multi-omics integration tool. It allows you to discover functional relationships between gene expression (transcriptomics) and metabolite abundance (metabolomics) across different experimental conditions and tissues.

Step-by-Step Instructions

1. Configure Analysis Controls

Use the sidebar on the left to set up your correlation parameters:

Select Metabolite: Search for a specific compound (e.g., Artemisinin, Artemisinic acid).
Analysis Mode: * Discovery Mode: Automatically identifies the top 30 genes most strongly correlated with your selected metabolite.
- Hypothesis Mode: Allows you to input a custom list of Gene IDs to test specific correlations.
Correlation Filter: Adjust the slider to set a minimum absolute correlation threshold (\(|r|\)). Only pairs meeting this value will be displayed.
Run Analysis: Click the green button to process the data.

2. Explore the Correlation Table

Once processed, the Correlation Table tab provides a searchable list of results:

Table Data: View Gene IDs, Correlation coefficients (\(r\)), and functional descriptions.
Row Selection: Click a single row to highlight a specific gene-metabolite pair. This action updates the Trend Visualization and Gene Context tabs.
Download: Use the Download CSV button to save your filtered results for offline analysis.

3. Visualize the Correlation Matrix

Switch to the Correlation Matrix tab to see a high-level heatmap of the relationships:

Red Cells: Represent positive correlations (gene and metabolite levels increase together).
Blue Cells: Represent negative correlations (as one increases, the other decreases).
Interactivity: Hover over any cell to see the exact correlation value and gene description.

4. Analyze Trends (Dual-Axis Plot)

The Trend Visualization tab offers a detailed look at how a selected pair behaves across four experimental conditions (WT Young Leaf, WT Mature Leaf, Mutant Young Leaf, and Mutant Mature Leaf):

Bars (Left Y-Axis): Represent the metabolite abundance/intensity.
Line (Right Y-Axis): Represents the gene expression levels in TPM.
Co-regulation: If the bars and lines follow the same pattern, it strongly suggests the gene is involved in the metabolite’s biosynthetic pathway.

5. Gene Functional Context

The Gene Context tab provides a deep dive into the selected gene, showing its Pfam domains and a summary of other metabolites it may be correlated with.

Experimental Conditions Reference

The data is derived from four specific states to help identify developmental and mutational effects:

WT_YL: Wild-type Young Leaf
WT_ML: Wild-type Mature Leaf
MUT_YL: tdd1 Mutant Young Leaf
MUT_ML: tdd1 Mutant Mature Leaf

Visual Guidance

Analysis Controls Figure 9: Use the sidebar to switch between Discovery and Hypothesis modes and set your correlation thresholds.

Figure 10: The Trend Analysis plot allows for direct comparison of metabolite intensity (bars) and gene expression (line) across wild-type and mutant conditions.

Functional Annotation

Purpose

The Functional Annotation module provides a centralized interface to retrieve biological context for Artemisia annua genes. By integrating multiple databases, you can identify gene functions, metabolic pathways, and protein domains through three flexible query methods.

Step-by-Step Instructions

Navigate: Go to the Functional Annotation section in the main menu.
Select Your Query Method:
- Annotation Table: * Choose a specific database from the dropdown menu (e.g., GO, KEGG, Pfam, or SwissProt).
  - Input corresponding IDs (e.g., GO:0003674).
  - Tip: Use the default example IDs provided in the text area to ensure your format matches the selected database.
- Gene-Based Queries: * Input specific Gene IDs (one per line) from the latest A. annua gene model.
  - This is the most efficient way to see all biological roles assigned to a particular gene across every available database.
- Functional Descriptions: * Perform a keyword search (e.g., “RNA polymerase” or “transferase”).
  - The system will scan all functional descriptions to find matches, helping you discover genes related to specific biological activities.
Execute Search: Click the Search button corresponding to your chosen method.
Explore Multi-Tab Results: The results are displayed in a dynamic table with tabs representing different databases (e.g., GO, Pfam, KEGG, EggNOG, etc.). This allows you to toggle between different layers of functional information for the same set of genes.

Example Workflow

To find information for a specific gene:

Enter mikado.Super-Scaffold_100038G100 into the Gene-Based Queries field.
Click Search Gene IDs.
Navigate through the generated tabs (GO, Pfam, KEGG) to view the associated Gene Ontology terms, protein domains, and metabolic pathways for that gene.

Visual Guidance

Gene ID Search Results Figure 11: Searching by Gene ID generates a consolidated view. Use the tabs to switch between database-specific annotations like KEGG pathways or GO terms.

Gene Editing and Epigenetics

CRISPR sgRNA Designer

Purpose

The CRISPR sgRNA Designer is a specialized tool for planning genome editing experiments in Artemisia annua. Using the crisprDesign methodology and a custom-built LQ9v1 BSgenome, this tool identifies optimal single-guide RNA (sgRNA) sequences for various CRISPR nucleases while considering coding sequences (CDS) and potential quality constraints.

Step-by-Step Instructions

1. Verify System Status

Before starting, check the Status Check badges at the top of the page.

Both the BSgenome and Annotation must show as loaded (green).
If they are not loaded, use the Force Load Genome or Force Load Annotation buttons to initialize the reference data.

2. Input Target Genes

In the Design Parameters sidebar:

Target Gene ID(s): Enter one or more Mikado Gene IDs (e.g., mikado.chr1G1335) into the text area (one per line).
Target Region: * CDS: Recommended for precision knockouts (targets coding regions only).
- Gene ± 500bp: Includes the promoter and terminator regions.
- Custom: Allows you to define specific upstream/downstream boundaries.

3. Select Nuclease and Filters

CRISPR Nuclease: Choose your enzyme system. The tool supports SpCas9 (NGG PAM), AsCas12a (TTTV PAM), enCas12a, and SaCas9.
Design Filters: Apply quality control filters:
- GC Content: Restricted to 20-80% for optimal stability.
- Poly-T stretches: Removes sequences that may terminate RNA Polymerase III transcription.
- Restriction Sites: Avoids sequences that might interfere with cloning.

4. Run and Analyze

Click the Design sgRNAs button. Once processing is complete, explore the results across the following tabs:

Guide RNAs: A detailed table listing the spacer sequence, PAM, genomic coordinates, and strand.

Understanding the Results

Column/Metric	Description
Spacer	The 20bp (for Cas9) or 21bp (for Cas12a) targeting sequence.
Composite Score	A normalized score (0–1) incorporating GC content, on-target affinity (CRISPRater), and sequence penalties.
Cut Site	The exact genomic coordinate where the double-strand break is predicted to occur.
Poly-T	Marked “Yes” if the guide contains a terminator sequence (avoid these for U6-driven vectors).

Visual Guidance

CRISPR Design Sidebar
Figure 12: Configure your CRISPR experiment by selecting the appropriate nuclease and target region (CDS or full gene).

sgRNA Output Table
Figure 13: The Quality Scores tab allows you to visualize and select the highest-performing guides based on composite modeling.

Data Export

You can download your designs in multiple formats:

CSV: Full data for spreadsheets.

Methylation

Purpose

The Methylation module is designed to investigate the epigenetic landscape of Artemisia annua. It allows you to explore genes involved in three primary regulatory mechanisms: RNA methylation (m6A), DNA methylation (5-methyladenosine), and histone modification (Histone-H3).

Step-by-Step Instructions

1. Choose Your Exploration Method

You can navigate the epigenetic data using two distinct approaches:

Method A: Filter by Type:
- Select a category from the Select Methylation Type dropdown.
- Use the dynamic Sub-filter to narrow results by functional role (e.g., Writer, Eraser, Reader for m6A) or specific Pfam domains (e.g., SET or PHD domains for Histone-H3).
Method B: Search by Gene ID:
- Enter a custom list of up to 100 Gene IDs in the Search by Gene ID(s) box to see if they have known epigenetic associations.

2. Interact with the Main Table

The results are displayed in a comprehensive data table:

Functional Links: Click on any Pfam ID to open the corresponding entry in the InterPro/Pfam database for detailed protein domain information.
Selection: Click on one or more rows to highlight genes for further analysis.
Action Buttons: Once rows are selected, use the buttons at the bottom:
- Copy Gene ID(s): Saves the selected IDs to your clipboard.
- Check Gene Expression: Automatically transfers the selected genes to the Expression tab for tissue-pattern analysis.

3. Analyze Tissue Expression Patterns

In the Expression tab:

Select Parts: Choose the plant tissues you wish to compare (Root, Stem, Leaf, Flower, Petiole).
Calculate: Click Calculate Median Expression.
Heatmap Visualization: The system generates a clustered heatmap using log2-transformed data (\(log_2(TPM + 1)\)).
Download: Export the high-resolution heatmap as a PNG, JPEG, or PDF.

4. Retrieve Sequences

Switch to the Sequence tab to view detailed transcript information:

Review the Transcript ID, CDS information, and the full nucleotide sequence.
The sequence is formatted for readability (100 characters per line).
Click Download Sequences (Text) to save the data in a FASTA-like format.

Understanding the Methylation Types

Category	Description	Sub-filters / Roles
m6A	N6-methyladenosine RNA methylation	Writers, Erasers, Readers
5-methyl	DNA methylation enzymes	DNA_methylase, DNMT1-RFD, TP_methylase
Histone-H3	Proteins modifying Histone H3	PHD and SET domains

Visual Guidance

Methylation Filter
Figure 14: Use the dynamic filters to isolate specific epigenetic regulators like m6A ‘Writers’ or Histone-H3 ‘SET’ domain proteins.

Methylation Heatmap
Figure 15: The Expression tab allows you to visualize if specific epigenetic regulators are tissue-specific (e.g., active primarily in the Root or Trichome-rich Leaf).

JBrowse Genomic Browser

Purpose

The JBrowse tool provides a high-performance, interactive environment for visualizing the Artemisia annua genome. Based on the latest genome assembly and Mikado annotations developed for this project, it allows you to explore gene structures, regulatory regions, and sequence features in their precise genomic context.

Step-by-Step Instructions

Launch the Browser: Navigate to Tools > JBrowse and click the Open button to initialize the browser interface.
Select Tracks: * Open the Track Selector in the sidebar.
- Enable the Artemisia Annotation (GFF3) and Genome Sequence (FASTA) tracks to populate the linear genome view.
Navigate & Zoom: Use the search bar to jump to a specific coordinate or Gene ID. Zoom in until individual gene features—such as Exons (thick blocks), Introns (thin lines), and UTRs—become visible.
Inspect Features: Click on any gene or CDS feature to open a detailed information window.
Extract Sequences:
- In the feature details window, click Show Feature Sequence.
- Custom Flanking Regions: By default, the tool allows you to extract the sequence with 1000 bp upstream and downstream.
- Modify Range: Click the wheel icon next to the extraction dropdown to adjust these boundaries (e.g., to 500 bp for promoter analysis or 2000 bp for broader context).

Example Workflow: Promoter Extraction

To extract a promoter sequence for a gene of interest:

Search for the gene ID and zoom in until the gene structure is clear.
Click the gene feature to open the details panel.
Select Show Feature Sequence.
Adjust the upstream extraction range to 2000 bp via the wheel icon to capture the full promoter region.
Copy the resulting sequence for downstream motif analysis or primer design.

Visual Guidance

JBrowse Workflow
Figure 16: Exploring genomic features in JBrowse. (a) Tracks loaded in the linear view. (b) Feature details window. (c) Sequence extraction tools. (d) Configuration of flanking regions.

BLAST

Purpose

The BLAST module is essential for cross-referencing external data with our database. It allows you to perform sequence similarity searches to identify corresponding Artemisia-DB Gene IDs. This is particularly useful if you have a sequence from a different species or are working with IDs from older Artemisia annua genome versions and need to find their updated counterparts in our latest annotation.

Step-by-Step Instructions

Input Your Query: Navigate to Tools > BLAST. You can provide your sequence in two ways:
- Text Input: Paste a FASTA-formatted sequence (e.g., >My_Gene\nATCG...) directly into the text area.
- File Upload: Upload a sequence file (up to 2MB) in .fasta or .txt format.
Configure Search Parameters: Adjust the thresholds to control the stringency of your search:
- Minimum Identity (%): The minimum percentage of identical residues required for a match (Default: 80%).
- E-value: The Expectation value threshold for reporting matches. A lower E-value indicate a more statistically significant hit (Default: 1e-5).
Execute: Click the Run BLAST button. The system will align your query against the Artemisia annua genome and the latest Mikado gene models.
Review Results: * The results table will display the matching Artemisia-DB Gene IDs, alignment scores, and coverage.
- Use these Gene IDs to jump to the Expression, Annotation, or CRISPR modules.
- Click Download CSV to save the alignment summary.

Example Workflow: Migrating Old Gene IDs

If you have a sequence associated with a gene ID from a previous publication or older version of the genome:

Paste the sequence into the BLAST text box.
Set the E-value to 1e-10 for high-stringency matching.
Identify the top hit in the results table (e.g., mikado.chr7G1274).
Copy this new ID and use it in the Global Expression Viewer to see how that gene is expressed across the six developmental stages of Salvia miltiorrhiza or Artemisia annua.

Visual Guidance

Figure 17: The BLAST interface facilitates the translation of external sequences into database-specific Gene IDs, enabling a full multi-omics deep dive.

Co-expression Analysis

Purpose

The Co-expression Analysis tool identifies genes that exhibit similar expression patterns across various experimental conditions in Artemisia annua. This module is essential for discovering functional modules, potential regulatory networks, and genes that may be co-regulated within specific metabolic pathways, such as artemisinin biosynthesis.

Step-by-Step Instructions

Define Your Query:
- Target Genes: Enter one or more Gene IDs (e.g., mikado.chr1G1335) into the search box to find their co-expressed partners.
- Correlation Method: Choose between Pearson (linear relationship) or Spearman (rank-based relationship) from the dropdown menu.
Filter the Dataset:
- Bioprojects: Select specific BioProjects or choose ALL to perform a global analysis across the entire database.
- Plant Parts: Choose specific tissues (e.g., Leaf, Root, Trichome). This list dynamically updates based on your BioProject selection.
- Sample Metadata: The table on the right will update automatically to show which samples are included in your current filter.
Set Statistical Thresholds:
- Minimum Absolute Correlation: Use the slider (0 to 1) to define the strength of the relationship (\(|r|\)).
- FDR Rate Threshold: Set the False Discovery Rate to ensure statistical significance.
Analyze Results: Click Run Co-expression to generate data across three interactive tabs:
- Table: A comprehensive list of co-expressed genes, including their correlation values (\(r\)) and significance (FDR).
- Network Graph: A visual representation of the top 10 co-expressed genes. The nodes represent genes, and the edges represent the strength of their co-expression.
- GO Enrichment: * Dot Plot: Visualizes significantly enriched Gene Ontology (GO) terms for the identified gene cluster. Hover over the plot and click the camera icon to save it.
  - Enrichment Table: Provides a detailed breakdown of enriched terms. You can refine this list using the Root Node filter or the Max P.adjust slider.

Example Workflow: Identifying Pathway Partners

If you want to find genes co-regulated with a key enzyme like ADS (Artemisinic acid synthase):

Enter the Gene ID for ADS.
Select Spearman correlation and set the threshold to 0.85.
Choose Leaf and Trichome tissues, as these are the primary sites of artemisinin production.
Run the analysis and check the Network Graph to see the top partners.
Review the GO Enrichment tab to see if the co-expressed cluster is enriched for “terpenoid biosynthetic processes.”

Visual Guidance

Co-expression Controls
Figure 18: Configure your network analysis by selecting correlation methods and specific tissue types to refine the biological context.

Figure 19: Visualize gene relationships through the Network Graph and interpret the biological significance of the co-expressed cluster via the GO Enrichment dot plot.

GO Enrichment Analysis

Purpose

The GO Enrichment Analysis tool allows you to identify overrepresented biological themes within a list of genes. By comparing your target gene set against the entire Artemisia annua genome “universe,” the tool determines which Biological Processes (BP), Molecular Functions (MF), and Cellular Components (CC) are statistically significant.

Step-by-Step Instructions

1. Input Your Gene List

Navigate to the GO Enrichment tab. You can provide your gene IDs in two ways:

Manual Entry: Paste your Mikado Gene IDs (one per line) into the text area.
File Upload: Upload a .txt file containing your gene list (one ID per line).
Note: You can combine both methods; the tool will automatically merge the lists and remove duplicates.

2. Execute the Analysis

Click the Run GO Enrichment button. The system will process your list using the Benjamini-Hochberg (FDR) correction method. You can monitor the progress via the Analysis Status box, which will report how many genes were successfully mapped and how many GO terms were identified.

3. Visualize Results (Dot Plot)

An interactive Dot Plot (powered by Plotly) will appear once the analysis is complete:

Dot Size: Represents the number of genes associated with the GO term.
Color Gradient: Represents the significance level (\(p.adjust\)). Redder dots indicate higher significance.
Interactivity: Hover over any dot to see the full GO description and exact statistics.

4. Filter and Refine

Use the controls below the plot to clean up your results:

Filter by Root Node: Isolate specific categories such as Biological Process, Molecular Function, or Cellular Component.
Max P.adjust Slider: Filter out less significant results by lowering the \(p\)-value threshold (e.g., set to \(0.05\) or \(0.01\)).

5. Inspect and Download the Data Table

The Results Table at the bottom provides a deep dive into every enriched term:

Gene Column: View the specific IDs from your input list that matched each GO term.
Download: Click Download Results Table to save the filtered data as a CSV file for your publication or further research.

Key Features Summary

Feature	Description
Statistical Method	Hypergeometric testing via the `enricher` methodology.
P-value Adjustment	Benjamini-Hochberg (FDR) correction to minimize false positives.
Ontology Scope	Comprehensive coverage of BP, MF, and CC root nodes.
Dynamic Filtering	Real-time table and plot updates based on significance thresholds.

Example Workflow: Functional Discovery

If you have identified a cluster of 50 genes that are highly expressed in the Root:

Paste those 50 IDs into the Enter Genes box.
Click Run GO Enrichment.
Review the Dot Plot. If you see terms like “secondary metabolite biosynthetic process” or “phenylpropanoid metabolic process,” it suggests this gene set is actively involved in producing specialized compounds.
Filter for Molecular Function to see if specific enzyme activities (e.g., “transferase activity”) are driving this process.

Visual Guidance

GO Dot Plot Figure 20: The Dot Plot summarizes the top enriched terms. Use the camera icon in the top-right of the plot to save the visualization as a PNG.

GO Results Table Figure 21: The results table provides the “Gene_Count” and the specific “Genes” involved in each term, allowing for direct functional cross-referencing.

Get Gene Sequences

Purpose: Retrieve and download gene sequences for specified Artemisia annua gene IDs.
Steps:
1. Navigate to Tools > Get Gene Sequences.
2. Enter gene IDs (one per line) in the text area (e.g., mikado.chr1G1813).
3. If ≤5 gene IDs are entered, view the sequences in the output area below the text input.
4. Click Download Sequences to save the sequences as a text file. Note: For >5 gene IDs, sequences are not displayed on-screen but can still be downloaded.

Download

Purpose

The Download menu provides a comprehensive repository for retrieving raw and processed Artemisia annua genomic and transcriptomic data. Whether you need expression matrices for specific tissues, bulk data for entire BioProjects, or core reference files (Genome and Annotation), this section facilitates high-throughput data access.

1. Download Data by Tissue

Purpose

This module allows you to extract expression data tailored to specific plant organs across all integrated studies.

Steps

Navigate: Go to Download > Download Data by Tissue.
Select Criteria: * Tissues: Check one or more boxes (e.g., Flower, Leaf, Root).
- Measure: Choose between TPM (standardized abundance) or Bias-corrected counts (for differential expression tools).
- Level: Select Gene-level or Transcript-level resolution.
Search: Click the Search button to generate a summary.
Review Summary: The right-hand panel will display the number of associated projects, total samples, and the estimated Total Download Size.
Download: Click Download Selected File(s) to receive a ZIP archive containing the .tsv files.

2. Download Data by Project

Purpose

Best for users interested in the original context of a specific study, this tool provides bulk expression matrices.

Steps

Navigate: Go to Download > Download Data by Project.
Select Project: Browse the table and click on a row to select a specific BioProject (e.g., studies comparing wild-type and tdd1 mutants).
Configure Format: In the “Download Options” box, select your preferred data type (TPM or Counts).
Process & Save: Click Download Selected Project Data. The system will dynamically pivot the underlying Parquet data into a wide-format .tsv file with genes as rows and samples as columns.

3. Download Reference Files

Purpose

Access the foundational reference files required for local bioinformatics pipelines.

Steps

Navigate: Go to Download > Download Files.
Select File Type: * Annotation File: Download the latest GFF3 file containing the Mikado gene models.
- Transcriptome Assembly: Download the compiled FASTA sequences of all transcripts.
- Genome File: Download the LQ-9_phase0 primary genome assembly in FASTA format.
Download: Click the respective buttons to save these files directly to your machine.

Understanding the Data Formats

File Type	Extension	Recommended Use
TPM Matrix	`.tsv`	Comparing expression levels across different genes or tissues.
Count Matrix	`.tsv`	Input for statistical tools like DESeq2 or EdgeR.
Reference	`.gff3` / `.fasta`	Local mapping, BLAST searches, or phylogenomic analysis.

Visual Guidance

Download by Tissue
Figure 22: The Tissue Download interface provides a real-time summary of the data size and file list before you initiate the ZIP download.

Figure 23: Select an individual study from the project table to retrieve its specific expression matrix.

Artemisia Database

A Comprehensive Resource for Artemisia annua Genomics & Transcriptomics

Key Features

Gene Expression

Genome Browser

BLAST Search

Functional Annotation

About the Database

What You Can Do:

Database Highlights

30K+

900+

300+

3K+

Global Expression Viewer Information

t-SNE Colored by Body

Filtered t-SNE Metadata

Median Expression Calculator

Example Gene IDs:

Select Tissue(s):

Artemisinin Pathway Explorer

Artemisinin Pathway Genes

Select Parts:

Gene Details

Transcripts and Sequences

Gene Category Explorer

Details

Select Parts:

Gene Details

Transcripts and Sequences

PlantTFDB Transcription Factors

Details

Select Parts:

Gene Details

Transcripts and Sequences

Pfam Transcription Factors

Filtered Data

Select Parts:

Gene Details

Transcripts and Sequences

Tissue-Specific Transcription Factors

Details

Select Parts:

Gene Details

Transcripts and Sequences

Multi-Omics Integration: Gene-Metabolite Correlation Analysis

Analysis Controls

Select Metabolite

Analysis Mode

Enter Gene IDs (one per line):

Correlation Filter

Data Information

Analyzing correlations...

Gene-Metabolite Correlation Results

Selected Pair Details

Correlation Matrix Visualization

Interpreting the Heatmap:

Dual-Axis Trend Analysis

Select a Gene-Metabolite Pair

Scientific Interpretation:

Gene Functional Context

Functional Annotation Explorer

Functional Annotation Queries

Query Results

CRISPR sgRNA Design Tool

Design Parameters

Design Summary

How to Use This Tool

Output File

Ready to Design sgRNAs

Enter Gene IDs

Select Parameters

Design Guides

Example Gene IDs:

Status Check:

Epigenetic Regulation Explorer

Methylation

Search by Gene ID(s)

Details

Select Parts: