About

PepRNA-DB is a curated, literature-derived database of peptide-mediated delivery systems for RNA and nucleic-acid cargoes, spanning eleven cargo classes across 509 publications and 2,582 curated experimental records. It is designed as a reusable research resource for comparing peptide sequence and chemistry, nucleic-acid sequences, cargo type, formulation context, biological model, uptake, functional activity, endosomal escape evidence, and in vivo delivery outcome.

Each record represents a peptide, peptide formulation, or peptide-model combination extracted from an individual publication. When the same peptide-cargo system was tested under different experimental settings, such as in vitro and in vivo, separate experiment records are created.

PepRNA-DB covers eleven nucleic-acid cargo classes: siRNA, mRNA, miRNA, anti-miRNA, sgRNA, shRNA, ASOs, SCOs, pDNA, oligonucleotides, and aptamers. RNA cargoes account for approximately 70% of curated records and represent the primary focus of the database. The remaining 30% consists of non-RNA nucleic-acid cargoes studied using the same cell-penetrating peptide systems, either alongside RNA in the same publication or in dedicated peptide-oligonucleotide delivery studies. A small number of records (n = 25) could not be assigned to any of the eleven classes due to insufficient reporting and are retained in the database but excluded from cargo-class analyses.

What is included

Each experiment record is linked to one paper and one peptide. Multiple experiments can come from the same paper, and multiple experiments can reuse the same peptide.

The database currently contains 2,582 curated experiment records across 509 publications, linked to 1,844 peptide systems with sequence information spanning eleven nucleic-acid cargo classes. To filter by cargo type, use the Browse page. Records include literature metadata, peptide identifiers and sequences, peptide modifications and noncanonical residues, nucleic-acid sequences and cargo information, formulation and concentration fields, biological model information, delivery evidence labels, outcome measurements, and curation notes where applicable.

Data curation workflow

PepRNA-DB was generated using a combined LLM-assisted and manual curation workflow.

Relevant articles were collected manually and processed one paper at a time. For each publication, the article content was provided to a large language model, which generated an initial structured extraction of peptide-based nucleic-acid delivery experiments. The extracted records were then manually reviewed, corrected, and standardized before inclusion in the database.

The curation workflow included manual selection and attachment of relevant articles; initial structured extraction using a large language model; manual checking and correction of extracted peptide, cargo, formulation, model, and outcome information; cleaning and standardization of peptide sequences; tokenization of peptide sequences for computational analysis; collection of noncanonical residues and residue names; collection of peptide modification descriptions and related formulation details; and assignment of evidence-based labels using conservative rules.

The labels in PepRNA-DB describe demonstrated experimental evidence. They should not be interpreted as predictions, assumptions, or general claims about a peptide's potential.

Access

PepRNA-DB is intended to be freely accessible for browsing, literature-based exploration, and secondary analysis. Downloadable tables are generated from the current database and are provided to support reproducible reuse of the curated records.

Users should cite PepRNA-DB when using the resource and should also cite the original source publications when discussing individual experimental records.