Model Card for IndoTaPas (MaskedLM Pre-training)

Model Details

Model Description

IndoTaPas (MaskedLM) is the foundational, pre-trained TaPas (Table Parser) model for the Indonesian language. It was pre-trained from scratch to understand the structural and semantic alignment between natural language text and tabular data in Indonesian.

The model was trained using a Masked Language Modeling (MLM) objective with Whole-Word Masking on a massive corpus of Indonesian Wikipedia text-table pairs. This model serves as a strong starting point for various tabular downstream tasks in Indonesian, such as Table Question Answering (TQA), Table Fact Verification, and Table-based Text Generation.

Developed by: Muhammad Rizki Syazali & Evi Yulianti
Model type: Table Parser (TaPas) for Masked Language Modeling
Language(s) (NLP): Indonesian (id)
Finetuned from model: Pre-trained from scratch using google/tapas-base configuration and an Indonesian-specific vocabulary (IndoBERT).

Model Sources

Repository: GitHub - IndoTaPas
Paper: "IndoTaPas: A TaPas-Based Model for Indonesian Table Question Answering" (Expert Systems with Applications, 2026)

Uses

Direct Use

As a pre-trained base model, it is not intended for direct use in end-user applications. It is designed to be fine-tuned on downstream tabular tasks. You can use this model directly only for masked word prediction within a table/text context.

Downstream Use

This model is intended to be fine-tuned for tasks such as:

Table Question Answering (Extractive): e.g., fine-tuning on the IndoHiTab dataset (see our one-stage and two-stage models).
Table Entailment / Fact Verification: Verifying if a statement is true

Downloads last month: -

Safetensors

Model size

0.1B params

Tensor type

F32

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support