Update README.md
Browse files
README.md
CHANGED
|
@@ -107,7 +107,10 @@ These datasets reflect a broad variety of sources ranging from biomedical abstra
|
|
| 107 |
|
| 108 |
## Performance
|
| 109 |
|
| 110 |
-
The presented model achieves state-of-the-art results in radiology natural language inference by leveraging semantics and discourse characteristics at training time more efficiently.
|
|
|
|
|
|
|
|
|
|
| 111 |
|
| 112 |
| | MS-CXR-T | MS-CXR-T | RadNLI (2 classes) | RadNLI (2 classes) |
|
| 113 |
| ----------------------------------------------- | :-------------------------------: | :----------------------: | :-------------------------: | :-------------: |
|
|
@@ -116,7 +119,6 @@ The presented model achieves state-of-the-art results in radiology natural langu
|
|
| 116 |
| [CXR-BERT-General](https://huggingface.co/microsoft/BiomedVLP-CXR-BERT-general) | 62.60 | .601 | 87.59 | .902 |
|
| 117 |
| [CXR-BERT-Specialized]((https://huggingface.co/microsoft/BiomedVLP-CXR-BERT-specialized)) | 78.12 | .837 | 89.66 | .932 |
|
| 118 |
| **BioViL-T** | **87.77** | **.933** | **90.52** | **.947** |
|
| 119 |
-
<br/>
|
| 120 |
|
| 121 |
The novel pretraining framework yields also better vision-language representations. Below is the zero-shot phrase grounding performance obtained on the [MS-CXR](https://physionet.org/content/ms-cxr/0.1/) benchmark dataset, which evaluates the quality of image-text latent representations.
|
| 122 |
|
|
@@ -125,9 +127,8 @@ The novel pretraining framework yields also better vision-language representatio
|
|
| 125 |
| BioViL | 1.07 +- 0.04 | 0.229 +- 0.005 |
|
| 126 |
| BioViL-L | 1.21 +- 0.05 | 0.202 +- 0.010 |
|
| 127 |
| **BioViL-T** | **1.33 +- 0.04** | **0.240 +- 0.005** |
|
| 128 |
-
<br/>
|
| 129 |
|
| 130 |
-
Additional experimental results and discussion can be found in the corresponding paper, [Learning to Exploit Temporal Structure for Biomedical Vision–Language Processing](https://arxiv.org/abs/2301.04558).
|
| 131 |
|
| 132 |
|
| 133 |
## Limitations
|
|
|
|
| 107 |
|
| 108 |
## Performance
|
| 109 |
|
| 110 |
+
The presented model achieves state-of-the-art results in radiology natural language inference by leveraging semantics and discourse characteristics at training time more efficiently.
|
| 111 |
+
The experiments were performed on the RadNLI and MS-CXR-T benchmarks, which measure the quality of text embeddings in terms of static and temporal semantics respectively.
|
| 112 |
+
BioViL-T is benchmarked against other commonly used SOTA domain specific BERT models, including [PubMedBERT](https://aka.ms/pubmedbert) and [CXR-BERT](https://aka.ms/biovil).
|
| 113 |
+
The results below show that BioViL-T has increased sensitivity of sentence embeddings to temporal content (MS-CXR-T) whilst better capturing the static content (RadNLI).
|
| 114 |
|
| 115 |
| | MS-CXR-T | MS-CXR-T | RadNLI (2 classes) | RadNLI (2 classes) |
|
| 116 |
| ----------------------------------------------- | :-------------------------------: | :----------------------: | :-------------------------: | :-------------: |
|
|
|
|
| 119 |
| [CXR-BERT-General](https://huggingface.co/microsoft/BiomedVLP-CXR-BERT-general) | 62.60 | .601 | 87.59 | .902 |
|
| 120 |
| [CXR-BERT-Specialized]((https://huggingface.co/microsoft/BiomedVLP-CXR-BERT-specialized)) | 78.12 | .837 | 89.66 | .932 |
|
| 121 |
| **BioViL-T** | **87.77** | **.933** | **90.52** | **.947** |
|
|
|
|
| 122 |
|
| 123 |
The novel pretraining framework yields also better vision-language representations. Below is the zero-shot phrase grounding performance obtained on the [MS-CXR](https://physionet.org/content/ms-cxr/0.1/) benchmark dataset, which evaluates the quality of image-text latent representations.
|
| 124 |
|
|
|
|
| 127 |
| BioViL | 1.07 +- 0.04 | 0.229 +- 0.005 |
|
| 128 |
| BioViL-L | 1.21 +- 0.05 | 0.202 +- 0.010 |
|
| 129 |
| **BioViL-T** | **1.33 +- 0.04** | **0.240 +- 0.005** |
|
|
|
|
| 130 |
|
| 131 |
+
Additional experimental results and discussion can be found in the corresponding paper, ["Learning to Exploit Temporal Structure for Biomedical Vision–Language Processing", CVPR'23](https://arxiv.org/abs/2301.04558).
|
| 132 |
|
| 133 |
|
| 134 |
## Limitations
|