Questions tagged [bert-language-model]

BERT, or Bidirectional Encoder Representations from Transformers, is a new method of pre-training language representations which obtains state-of-the-art results on a wide array of Natural Language Processing (NLP) tasks.

The academic paper can be found here. And the original implementation of the BERT by google can be found here.

Reference

BERT Paper: https://arxiv.org/abs/1810.04805.

BERT Implementation: https://github.com/google-research/bert

909 questions
5
votes
1 answer

BERT performing worse than word2vec

I am trying to use BERT for a document ranking problem. My task is pretty straightforward. I have to do a similarity ranking for an input document. The only issue here is that I don’t have labels - so it’s more of a qualitative analysis. I am on my…
4
votes
2 answers

Loss function for comparing two vectors for categorization

I am performing a NLP task where I analyze a document and classify it into one of six categories. However, I do this operation at three different time periods. So the final output is an array of three integers (sparse), where each integer is the…
Jameson
  • 3,777
  • 4
  • 13
  • 29
4
votes
3 answers

AttributeError: 'str' object has no attribute 'dim' in pytorch

I got the following error output in the PyTorch when sent model predictions into the model. Does anyone know what's going on? Following are the architecture model that I created, in the error output, it shows the issue exists in the x =…
4
votes
1 answer

BERT-based NER model giving inconsistent prediction when deserialized

I am trying to train an NER model using the HuggingFace transformers library on Colab cloud GPUs, pickle it and load the model on my own CPU to make predictions. Code The model is the following: from transformers import…
4
votes
2 answers

BERT sentence embeddings from transformers

I'm trying to get sentence vectors from hidden states in a BERT model. Looking at the huggingface BertModel instructions here, which say: from transformers import BertTokenizer, BertModel tokenizer =…
Mittenchops
  • 15,641
  • 28
  • 103
  • 200
4
votes
1 answer

Sliding window for long text in BERT for Question Answering

I've read post which explains how the sliding window works but I cannot find any information on how it is actually implemented. From what I understand if the input are too long, sliding window can be used to process the text. Please correct me if I…
4
votes
1 answer

How does BertForSequenceClassification classify on the CLS vector?

Background: Following along with this question when using bert to classify sequences the model uses the "[CLS]" token representing the classification task. According to the paper: The first token of every sequence is always a special…
Kevin
  • 2,481
  • 4
  • 27
  • 63
4
votes
1 answer

Using BERT Embeddings in Keras Embedding layer

I want to use the BERT Word Vector Embeddings in the Embeddings layer of LSTM instead of the usual default embedding layer. Is there any way I can do it?
4
votes
1 answer

TFBertMainLayer gets less accuracy compared to TFBertModel

I had a problem with saving weights of TFBertModel wrapped in Keras. the problem is described here in GitHub issue and here in Stack Overflow.The solution proposed in both cases is to use config =…
Marzi Heidari
  • 2,238
  • 2
  • 16
  • 41
4
votes
3 answers

How to stop BERT from breaking apart specific words into word-piece

I am using a pre-trained BERT model to tokenize a text into meaningful tokens. However, the text has many specific words and I don't want BERT model to break them into word-pieces. Is there any solution to it? For example: tokenizer =…
4
votes
1 answer

huggingface bert showing poor accuracy / f1 score [pytorch]

I am trying BertForSequenceClassification for a simple article classification task. No matter how I train it (freeze all layers but the classification layer, all layers trainable, last k layers trainable), I always get an almost randomized accuracy…
Zabir Al Nazi
  • 8,008
  • 2
  • 16
  • 34
4
votes
1 answer

Can you train a BERT model from scratch with task specific architecture?

BERT pre-training of the base-model is done by a language modeling approach, where we mask certain percent of tokens in a sentence, and we make the model learn those missing mask. Then, I think in order to do downstream tasks, we add a newly…
viopu
  • 43
  • 1
  • 5
4
votes
1 answer

How to get the probability of a particular token(word) in a sentence given the context

I'm trying to calculate the probability or any type of score for words in a sentence using NLP. I've tried this approach with GPT2 model using Huggingface Transformers library, but, I couldn't get satisfactory results due to the model's…
D.Perera
  • 575
  • 1
  • 6
  • 24
4
votes
1 answer

Does BertForSequenceClassification classify on the CLS vector?

I'm using the Huggingface Transformer package and BERT with PyTorch. I'm trying to do 4-way sentiment classification and am using BertForSequenceClassification to build a model that eventually leads to a 4-way softmax at the end. My understanding…
4
votes
2 answers

BERT get sentence level embedding after fine tuning

I came across this page 1) I would like to get sentence level embedding (embedding given by [CLS] token) after the fine tuning is done. How could I do it? 2) I also noticed that the code on that page takes a lot of time to return results on the test…
user2543622
  • 4,682
  • 20
  • 62
  • 117
1 2
3
60 61