User Guide


1 Workflow

    PERlncDB integrates 6,360 omics datasets from 19 species, including ChIP-seq, BS-seq, and RNA-seq, and identifies over 160,000 high-quality lncRNAs. It predicts 80,308 collinearly conserved lncRNAs based on homologous genomic blocks across species. Epigenetic modifications, including histone modifications and DNA methylation, are analyzed for genome-wide signals, differential regions, and lncRNA-associated annotations. Expression levels of lncRNAs, differential analysis, and transcription factor binding sites are also identified. All results and data are integrated and uploaded to the PERlncDB platform.

2 Overview

    The Plant lncRNA Epigenetic Regulatory Database (PERlncDB) offers detailed information and visualization of 19 species of epigenetic features associated with plant lncRNAs, including Transposable Elements, Transcription Factors, Histone Modifications, and DNA Methylation. Additionally, it provides differential regulatory information on the epigenetic landscape of lncRNAs under different mutant types or stress conditions. The database primarily comprises functional modules such as Home, Browse, Search, Cross-sepcies Analysis, Dynamics, Visualization, and Tools.


3 Homepage

    On the homepage of the PERlncDB, there is a navigation bar with six modules and quick access to searches for Histone Modification, DNA Methylation, and Transcription Factor. Additionally, there is a global search for lncRNA IDs and links to detailed pages for each species.


4 Browse

    In the Browse module, genome information for each species is provided, including details such as the reference genome version, genome size, and the number of lncRNAs and genes. Additionally, there is a page dedicated to browsing lncRNAs for each species.

    Detailed page includes a table containing basic information about lncRNAs as well as annotations for transposable elements (TEs), transcription factors (TFs), histone modifications, DNA methylation, and other relevant information.

6 Cross-sepcies Analysis

    This module provide “Synteny-conserved LncRNA Prediction”, “LncRNA Synteny Scan” and “Epigenomic Correlation” to explore the epigenomic features by conserved lncRNAs pairs across multiple species.

6.1 Synteny-conserved LncRNA Prediction

    Synteny conservation, defined as the evolutionary maintenance of gene order and genomic organization across species, serves as the cornerstone of our lncRNA identification pipeline. Through genome-wide analysis of homologous blocks, we have systematically predicted 84,602 conserved syntenic lncRNAs across 19 species, anchored within 234,350 syntenic blocks. For each query lncRNA, our pipeline precisely maps it to syntenic blocks in target genomes and identifies all potential synteny-conserved lncRNAs within these evolutionarily conserved genomic regions.

    JCVI was used to construct whole-genome syntenic blocks based on protein-coding gene sequence homology between species. And then, we employ two core criteria to screen for synteny-conserved lncRNAs based on each block. LncRNAs were classified as synteny-conserved when meeting dual criteria: (i) at least five (moderate criterion) adjacent homologous protein-coding genes must be syntenic between the two species; and (ii) transcriptional orientation concordance with any neighboring homologous coding gene.

    To accommodate diverse research requirements, we implemented a tiered classification system incorporating based on the number of adjacent homologous coding genes: (i) Lenient (≥3 genes): optimized for lncRNA detection sensitivity, suitable for initial exploration; (ii) Moderate (≥5 genes): as the main classification criterion, balancing sensitivity and specificity; (iii) Strict threshold (≥10 genes and in the same transcriptional direction as the most proximal homologous coding gene): dedicated to high-confidence assessment of deeply conserved functional elements.

6.2 LncRNA Synteny Scan

    The “LncRNA Synteny Scan” offers comprehensive tools for visualizing lncRNA synteny, enabling users to explore syntenic relationships across entire genomes or specific regions. By submitting a query species and synteny-conserved lncRNA pairs of interest, users can access detailed visualizations of local syntenic blocks as well as global synteny displays between two species.

6.3 Epigenomic Correlation

    "Epigenomic Correlation" provides details into the correlation analysis of epigenetic modification signals between two samples. Regions of synteny-conserved lncRNAs were divided into upstream 2 kb, gene body, and downstream 2 kb regions, with each part further divided into 20 subregions. The modification level for each subregion was calculated using CGmaptools or deepTools. Finally, the correlation of epigenetic modification levels between two specified samples was computed using a Python script with the Spearman method.

    The results include images illustrating the distribution and correlation of epigenetic modification levels for two lncRNAs across upstream, gene body, and downstream regions in specific samples. In the correlation plot, red indicates the degree of positive correlation, while blue represents negative correlation. The dashed line marks the correlation threshold corresponding to a significance level of 0.05. Correlation values exceeding the absolute value of this threshold suggest a more reliable level of correlation in the epigenetic modifications of the two lncRNAs in the given sample.

7 Dynamics

    In the Dynamics module, differential analysis data for epigenetic modification signals under mutations and stress conditions are provided for approximately 2000 samples. Users can retrieve relevant differential epigenetic regions, differential results, and sample information by selecting the species of interest, modification type, and sample details.

8 Visualization

    In the Visualization module, the JBrowse tool is integrated, facilitating the exploration of epigenetic landscapes for species of interest, various types of modifications, and sample data.

9 Tools

    This module provides sequence alignment information between multiple species and automatic annotation of differential epigenetic regions on specific genomic regions.

9.1 BLAST

    Searching for transcription factor binding sites in specific genomic regions, and finding associated regions and lncRNAs. Users can submit the species of interest, transcription factor type, and genomic region to retrieve information about relevant regions, including sample descriptions, locations, associated lncRNAs, sample sources, and more. Additionally, users can navigate to the detailed page of the gene by clicking on the lncRNA ID.

9.2 Auto Annotation

    This study employed the "dmr" analysis strategy of CGmapTools software to systematically accomplish the following key steps: (1) detection and localization of differentially methylated regions (DMRs); (2) precise calculation of average methylation levels in each region, thereby accurately identifying genomic regions with significant methylation changes across different samples or experimental conditions.

    For ChIP-seq data analysis, we integrated both DiffBind and DESeq2 methods for statistical significance analysis. By applying stringent thresholds for fold change and p-values, we reliably identified biologically significant differential binding peaks.

    This module, we provide all differential epigenetic modification regions to exploring the dynamics informations in specific genomic regions by yourself. Based on the genomic coordinate files (BED format) provided by users, this study employed bedtools for efficient genomic interval operations and utilized the ChIPseeker package to achieve precise annotation and localization of differentially modified epigenetic regions with long non-coding RNAs (lncRNAs). You can retrieve information on interesting genomic regions associated with epigenomics.