Model Details

Domain:

Biology

Task:

Enzyme function prediction

Model Access:

Open weights (unrestricted)

AI Tools Usage

This model is commonly used behind the scenes in AI tools.

Introduction

The accurate computational annotation of protein sequences with enzymatic function, especially those that are part of the functional and taxonomic dark matter, remains a fundamental challenge in bioinformatics. Here, we present HiFi-NN, (Hierarchically-Finetuned Nearest Neighbor search) which annotates protein sequences to the 4th level of EC (enzyme commission) number with greater precision and recall than all existing deep learning methods. HiFi-NN is a hierarchically-finetuned deep learning method based on a combination of semi-supervised representation learning and a nearest neighbours classifier. Furthermore, we show that this method can correctly identify the EC number of a given sequence to identities below 40%, where the current state of the art annotation tool, BLASTp, cannot. We proceed to improve the representations learned by increasing the diversity of the training set, not just in sequence space but also in terms of the environment the sequences have been sampled from. Finally, we use HiFi-NN to annotate a portion of microbial dark matter sequences in the MGnify database.

Training

Training Code AccessibilityMIT license https://github.com/Basecamp-Research/HiFi-NN?tab=readme-ov-file

HardwareNVIDIA A100

Hardware Quantity8

Size Notes: "the model retrained with 3M selected, environmentally diverse sequences from Basecamp Research’s BaseGraph."

Parameters

Parameters3000000

Notes: "The model boasts over 3M parameters."

Authors

Gavin Ayres

Basecamp Research,Technical University of Munich,Molecular Institute of Biology,Microsoft Research | HiFi - NN - Capabilities, Benchmarks and Use Cases

Top Tasks

Top Countries

Top Domains

Top Organizations

Top Categories

Top Collections

Platform

Top Tasks

Top Countries

Top Domains

Top Organizations

Top Categories

Top Collections

Platform

Model Details

AI Tools Usage

Introduction

Training

Parameters

Authors

Top Tasks

Top Countries

Top Domains

Top Organizations

Top Categories

Top Collections

Platform

Model Details

AI Tools Usage

Introduction

Training

Parameters

Authors