Questions tagged [huggingface-transformers]

Transformers is a Python library that implements various transformer NLP models in PyTorch and Tensorflow.

transformers is a natural language processing (NLP) library that implements many state-of-the-art transformer models in Python using PyTorch and Tensorflow. It is created and maintained by HuggingFace. The library is available through package managers, and it is open-sourced on GitHub. The library was formerly known as pytorch-transformers and before that as pytorch-pretrained-bert.

873 questions
24
votes
3 answers

How to build semantic search for a given domain

There is a problem we are trying to solve where we want to do a semantic search on our set of data, i.e we have a domain-specific data (example: sentences talking about automobiles) Our data is just a bunch of sentences and what we want is to give a…
19
votes
2 answers

How to compare sentence similarities using embeddings from BERT

I am using the HuggingFace Transformers package to access pretrained models. As my use case needs functionality for both English and Arabic, I am using the bert-base-multilingual-cased pretrained model. I need to be able to compare the similarity of…
KOB
  • 3,062
  • 1
  • 24
  • 60
14
votes
1 answer

what's difference between tokenizer.encode and tokenizer.encode_plus in Hugging Face

Here is an example of doing sequence classification using a model to determine if two sequences are paraphrases of each other. The two examples give two different results. Can you help me explain why tokenizer.encode and tokenizer.encode_plus give…
andy
  • 1,211
  • 4
  • 8
  • 22
12
votes
1 answer

How to use Hugging Face Transformers library in Tensorflow for text classification on custom data?

I am trying to do binary text classification on custom data (which is in csv format) using different transformer architectures that Hugging Face 'Transformers' library offers. I am using this Tensorflow blog post as reference. I am loading the…
10
votes
1 answer

BertForSequenceClassification vs. BertForMultipleChoice for sentence multi-class classification

I'm working on a text classification problem (e.g. sentiment analysis), where I need to classify a text string into one of five classes. I just started using the Huggingface Transformer package and BERT with PyTorch. What I need is a classifier with…
9
votes
3 answers

How to disable TOKENIZERS_PARALLELISM=(true | false) warning?

I use pytorch to train huggingface-transformers model, but every epoch, always output the warning: The current process just got forked. Disabling parallelism to avoid deadlocks... To disable this warning, please explicitly set…
9
votes
3 answers

Where does hugging face's transformers save models?

Running the below code downloads a model - does anyone know what folder it downloads it to? !pip install -q transformers from transformers import pipeline model = pipeline('fill-mask')
user3472360
  • 321
  • 3
  • 17
9
votes
1 answer

PyTorch BERT TypeError: forward() got an unexpected keyword argument 'labels'

Training a BERT model using PyTorch transformers (following the tutorial here). Following statement in the tutorial loss = model(b_input_ids, token_type_ids=None, attention_mask=b_input_mask, labels=b_labels) leads to TypeError: forward() got an…
8
votes
2 answers

Load a pre-trained model from disk with Huggingface Transformers

From the documentation for from_pretrained, I understand I don't have to download the pretrained vectors every time, I can save them and load from disk with this syntax: - a path to a `directory` containing vocabulary files required by the…
Mittenchops
  • 15,641
  • 28
  • 103
  • 200
8
votes
1 answer

How to get immediate next word probability using GPT2 model?

I was trying the hugging face gpt2 model. I have seen the run_generation.py script, which generates a sequence of tokens given a prompt. I am aware that we can use GPT2 for NLG. In my use case, I wish to determine the probability distribution for…
Gaurang Tandon
  • 5,704
  • 9
  • 37
  • 73
8
votes
2 answers

Training TFBertForSequenceClassification with custom X and Y data

I am working on a TextClassification problem, for which I am trying to traing my model on TFBertForSequenceClassification given in huggingface-transformers library. I followed the example given on their github page, I am able to run the sample code…
8
votes
1 answer

Updating a BERT model through Huggingface transformers

I am attempting to update the pre-trained BERT model using an in house corpus. I have looked at the Huggingface transformer docs and I am a little stuck as you will see below.My goal is to compute simple similarities between sentences using the…
7
votes
1 answer

Named Entity Recognition with Huggingface transformers, mapping back to complete entities

I'm looking at the documentation for Huggingface pipeline for Named Entity Recognition, and it's not clear to me how these results are meant to be used in an actual entity recognition model. For instance, given the example in documentation: >>> from…
Mittenchops
  • 15,641
  • 28
  • 103
  • 200
7
votes
1 answer

Use of attention_mask during the forward pass in lm finetuning

I had a question about the language model finetuning code on the Hugging Face repository. It seems that the forward method of the BERT model takes as input an argument called attention_mask. The documentation says that the attention mask is an…
7
votes
2 answers

Optimizer and scheduler for BERT fine-tuning

I'm trying to fine-tune a model with BERT (using transformers library), and I'm a bit unsure about the optimizer and scheduler. First, I understand that I should use transformers.AdamW instead of Pytorch's version of it. Also, we should use a warmup…
geekazoid
  • 3,305
  • 3
  • 29
  • 38
1
2 3
58 59