Sentence transformers are a significant advancement in natural language processing, enabling the conversion of textual data into meaningful vector representations or ‘embeddings’. These embeddings effectively encapsulate the contextual and semantic nuances of sentences, making them invaluable for various machine learning applications.
Among the most prominent sentence transformers is BERT (Bidirectional Encoder Representations from Transformers), known for its deep understanding of context and language structure. Other notable models include RoBERTa and DistilBERT, each contributing uniquely to the landscape of natural language understanding.
These models are particularly adept at generating embeddings that can be utilized as foundational inputs for various machine learning tasks. By leveraging the pre-trained knowledge of these transformers, it’s possible to enhance the performance of other machine learning models in tasks such as sentiment analysis, text classification, and more. This process of transfer learning is crucial, as it allows for the application of advanced linguistic understanding to a broad range of computational challenges.
In this technical article, we explore Few-Shot Learning and how its fusion with Contrastive Learning enhances frameworks like SetFit. We also touch upon the crucial aspects of fine-tuning and hyperparameter tuning for optimal model performance :)
Fine-tuning sentence transformers, which are powerful tools for understanding language, often requires a lot of data. This can be a big problem, especially when there’s not enough relevant data available. Imagine trying to teach someone about a topic using only a few examples — they might not learn it very well. It’s similar with these AI models; they need lots of examples to learn how to do a specific job well. But finding enough good quality data can be tough, particularly for less common topics or for smaller organizations that don’t have many resources. This makes it hard to customize these AI models for specific needs if you don’t have enough data.
Few-shot learning and contrastive learning are two innovative approaches in the field of machine learning, each addressing unique challenges in training AI models.
Few-shot learning is focused on training models with a very limited amount of data. Traditional machine learning algorithms typically require large datasets to learn effectively. However, in many real-world scenarios, such a wealth of data is not available. Few-shot learning algorithms are designed to overcome this limitation. They do so by leveraging prior knowledge — either from similar tasks or from a subset of the data — to make accurate predictions with only a few examples. This approach is particularly useful in specialized fields where data is scarce.
An example of basic few-shot learning
Contrastive learning, on the other hand, is a technique used primarily in unsupervised learning, particularly for learning efficient representations of data. It works by teaching a model to understand the differences and similarities between pairs of examples. In simple terms, the model learns by comparing things: it is trained to pull similar items closer in the representation space and push dissimilar items apart. This approach is highly effective in tasks like image and speech recognition, where understanding the nuanced differences between inputs is crucial.
A figure shows us the impact of fine-tuning a sentence transformer on its embedding space using contrastive learning
When combined, few-shot and contrastive learning can be particularly powerful. Contrastive learning can be used to create rich, detailed representations of data, which few-shot learning algorithms can then utilize to make accurate predictions with minimal examples. This synergy allows for the development of robust models capable of learning effectively from limited data, a major advantage in fields where acquiring large datasets is challenging or impossible.
SetFit is a tool that helps in fine-tuning sentence transformers, making it easier to build classification systems. Here’s how it works:
SetFit is a useful tool for adapting sentence transformers to your specific needs, making it easier and more efficient to build effective text classification systems.
SetFit, though utilizing smaller models compared to other few-shot methods, achieves comparable or superior performance across various benchmarks. For instance, in the RAFT few-shot classification benchmark, the SetFit approach with Roberta (specifically the all-roberta-large-v1 model), which has 355 million parameters, surpasses the results of both PET and GPT-3. It ranks slightly below the average human performance and the T-few model, which is 30 times larger with 11 billion parameters. Impressively, SetFit exceeds human baseline performance in 7 out of the 11 RAFT tasks, showcasing its effectiveness despite its relatively smaller model size.
Credits to : https://huggingface.co/blog/setfit
Credits to : https://huggingface.co/blog/setfit
SetFit stands out for its high accuracy with smaller models, leading to rapid training speeds and significantly lower costs. For example, training SetFit on an NVIDIA V100 GPU with only 8 labeled examples takes a mere 30 seconds, costing about $0.025. In contrast, training the larger T-Few 3B model on an NVIDIA A100 takes about 11 minutes and costs roughly $0.7 for the same task, which is 28 times more expensive. Notably, SetFit is versatile enough to run on a single GPU, like those available on Google Colab, and it can even be trained on a CPU in just a few minutes. Despite its speed, SetFit maintains comparable performance to larger models. Additionally, when it comes to inference and model distillation, SetFit can achieve speed-ups of up to 123 times, demonstrating its impressive efficiency and cost-effectiveness.
In this architecture, SetFit operates in a streamlined two-stage process aimed at enhancing sentence transformers for classification tasks.
In the first stage, “ST Fine tuning,” the process begins with few-shot training data. From this data, SetFit generates sentence pairs, which are instrumental in understanding the relationships and context within the text. These pairs are then used to fine-tune a pre-trained sentence transformer (ST), allowing the model to adjust to the specific nuances of the training data.
The second stage, “Classification head training,” takes the fine-tuned sentence transformer and applies it to encode sentences, effectively transforming them into sentence embeddings. These embeddings capture the essence of the text data in a form that’s suitable for classification. Finally, a classification head is trained using these embeddings. This classification head is a specialized component that learns to categorize the embedded sentences into predefined classes, thereby completing the process of preparing the model to accurately perform classification tasks.
Through this two-stage approach, SetFit efficiently adapts sentence transformers to specific classification challenges, utilizing minimal training data to achieve high accuracy.
In SetFit, generating sentence pairs is a critical step for fine-tuning sentence transformers for specific tasks. The process involves creating pairs of sentences from the training data that are either similar or dissimilar. This is done by pairing a sentence with another sentence that shares a similar meaning (a positive pair) or with one that has a different meaning (a negative pair). By doing this, SetFit teaches the sentence transformer to understand the nuances of similarity and difference within the context of the given task. These sentence pairs are then used to fine-tune the transformer so that it can produce more accurate embeddings for the classification task at hand, effectively adapting the model with just a small amount of data.
SetFit Iter 1.
Traditionally, text classification involves two stages: first, a sentence transformer (ST) generates embeddings from text, then a classifier model is trained on these embeddings. However, SetFit integrates these stages by directly fine-tuning the ST with a classification layer. This approach reduces the overall complexity and computational overhead. SetFit also leverages contrastive learning with smaller datasets for rapid fine-tuning and efficient inference, providing a diverse array of STs for various NLP tasks.
This code snippet demonstrates how to fine-tune a SetFit model for text classification using Python libraries such as setfit, optuna, and sentence_transformers. Here’s a breakdown of the code:
Training :
The setfit and optuna packages are installed. SetFit is likely the framework for sentence embedding and classification, while Optuna is used for hyperparameter optimization, though it’s not directly applied in the given code.
The code starts by loading a dataset from a CSV file, data.csv.It uses LabelEncoder from scikit-learn to encode categorical labels into numerical form, which is necessary for machine learning models to process.The train_test_split function splits the data into training and test sets, with 20% of the data reserved for testing.The training and test sets are then combined with their respective labels.
The sample_dataset function is used to create a balanced subset of the training data, ensuring that there are an equal number of samples for each label.A pre-trained sentence transformer model, paraphrase-mpnet-base-v2, is loaded to serve as the base of the SetFit model.A LogisticRegression model is chosen as the classification head.The SetFitModel is then instantiated with the sentence transformer and logistic regression model.
TrainingArguments are defined to set the batch size, number of epochs, and an end_to_end flag, which suggests that the training will include both the embedding model and the classifier.A Trainer is created with the defined model, training arguments, training dataset, and a column mapping that specifies which columns in the dataset correspond to the text and labels.The train() method of the Trainer class is called to start the training process.
Finally, the evaluate() method is used to assess the model's performance on the test dataset.Let’s move on now to parameter tuning.
Define a function to initialize the SetFit model with hyperparameters
This function initializes the SetFit model with specific hyperparameters.It takes a dictionary of hyperparameters (params) as input, where max_iter and solver are used to configure the logistic regression classification head.The function returns a SetFit model initialized with the specified hyperparameters.
This function defines the hyperparameter search space for optimization using Optuna.It specifies ranges or choices for various hyperparameters, such as body_learning_rate, num_epochs, batch_size, seed, max_iter, solver, and end_to_end.Optuna will explore different combinations of these hyperparameters during the optimization process.
The Trainer is created to manage the entire fine-tuning and hyperparameter optimization process.It is configured with the training and evaluation datasets (train_dataset and test_dataset), the model_init function for model initialization, and the column mapping to specify which columns in the dataset correspond to text and labels.
The trainer.hyperparameter_search() method is used to perform hyperparameter optimization.It takes several parameters:direction="maximize": This specifies that the optimization aims to maximize a certain metric (e.g., accuracy).hp_space=hp_space: It uses the hyperparameter space defined earlier.n_trials=10: It specifies the number of optimization trials to run (in this case, 10 trials).After optimization, it returns the best set of hyperparameters in best_run.
The best hyperparameters from the optimization are applied to the final model using trainer.apply_hyperparameters.The final model is then trained using trainer.train().
Finally, the code evaluates the performance of the fine-tuned model on the test dataset and collects metrics.These metrics could include accuracy, precision, recall, F1-score, etc., depending on the specific classification task.
SetFit represents a groundbreaking approach to text classification and fine-tuning. By seamlessly integrating sentence transformers and classification tasks into a unified process, it simplifies and accelerates model development. With efficient training and diverse pre-trained models, SetFit offers a promising avenue for creating high-performance text classifiers while reducing complexity and resource demands. This innovative framework is poised to advance natural language processing tasks, making them more accessible and effective for a wide range of applications.
Setfit paper: https://arxiv.org/abs/2209.11055
Hugging face blog about Setfit : https://huggingface.co/blog/setfit
Thanks for reading! Feel free to reach out if you found this post interesting or if it helped you out in any way.