YAML Metadata Warning:empty or missing yaml metadata in repo card

Check out the documentation for more information.

FAISS Index for Patent Retrieval

This repository contains FAISS index files created with the following parameters:

  • Model: SPECTER2 (allenai/specter2_base)
  • Index type: IVF100,PQ16.index
  • Distance metric: L2
  • Embedding dimension: 768
  • Corpus: USPTO Patents
  • PQ Quantization: PQ64 (improved precision over default PQ16)

Files

  • specter2_IVF100,PQ16.index: FAISS index file
  • specter2_IVF100,PQ64.index: FAISS index file
  • emb_specter2.memmap: Embedding memmap file
  • patents_all.parquet: Corpus parquet file

Usage

To use these files, download them and load with FAISS:

import faiss
import numpy as np
from huggingface_hub import hf_hub_download

# Download and load index
index_path = hf_hub_download(repo_id="ErzhuoShao/USPTO-Specter2-faiss", filename="specter2_IVF100,PQ16.index")
index = faiss.read_index(index_path)

# Optionally download and load embeddings if needed
emb_path = hf_hub_download(repo_id="ErzhuoShao/USPTO-Specter2-faiss", filename="emb_specter2.memmap")
embeddings = np.memmap(
    emb_path,
    mode="r",
    dtype=np.float32
).reshape(-1, 768)  # Adjust shape as needed

# Load corpus
import pandas as pd
corpus = pd.read_parquet("path/to/downloaded/corpus.parquet")

# Example query
from transformers import AutoTokenizer, AutoModel
import torch

# Load the same model used to build the index
tokenizer = AutoTokenizer.from_pretrained("allenai/specter2_base")
model = AutoModel.from_pretrained("allenai/specter2_base")

# Encode a query
query = "Machine learning techniques for computer vision"
inputs = tokenizer(query, return_tensors="pt", max_length=512, padding=True, truncation=True)
with torch.no_grad():
    outputs = model(**inputs)
query_vector = outputs.last_hidden_state[:, 0].numpy().astype('float32')

# Search the index
distances, indices = index.search(query_vector, k=5)

For more details, refer to the original repository.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support