Best way to do Named Entity Recognition in 2024 with GliNER and spaCy - Zero Shot NER

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
hi and welcome back to the channel in this video I'm going to be talking about the best way to do named entity recognition especially if you don't have any training data in 2024 and we're going to be using a method of named entity recognition called zero shot learning zero shot learning is a type of machine learning where we don't have any training data or named examples to train a machine learning model instead we use an off-the-shelf model that has seen a lot of different examples and can generalize or make predictions on unseen data for this video we're going to be using gliner a brand new Bert based or Transformer based named entity recognition model that can do zero shot out of the box and do it remarkably well over the past week I've designed gliner Spacey which we're going to see in this video so that you can load up gliner in a Spacey Pipeline and use it with just a couple lines of code so what does this mean for you well it means that if you have generic entities like person organization date Etc you can automatically use Spacey to identify them with gliner but more importantly if you have domain specific entities so if your area of research is maybe Asian history or uh the Holocaust you can use gliner to create your own labels and automatically identify those labels in a unseen text now this means that you have a lot of different applications of gliner in a Spacey pipeline it's easy to install it's easy to load up let's pop on over to VSS code and our terminal to see how to install it with Pip and most importantly how to use it in a jupyter notebook to get started with gliner and gliner Spacey you can run the command pip install gliner Das Spacey this will install not only gliner spacey but also gliner itself and after a few seconds of downloading all the dependencies you'll have everything that you need to get started and up and running with it now that we've got it installed let's go ahead and jump on over into our Jupiter notebook and start to use it with some examples liner spy is meant to work alongside Spacey so that means that we need to import Spy Next we need to import from gliner spy. pipeline import gliner Spacey note the capital G and the capital S here very important this is going to import our custom gliner Spacey component now we can go ahead and create a Spacey pipeline just like we normally would in my case I'm going to be using the ncore web model this is the small English model from spacy and now we're going to add in a special component gliner Spacey this has been imported just up above when we imported our gliner Spacey class this component can take a config right now we're just going to pass in a couple different things here specifically the key of labels and the kind of labels that we want the zero shot learning model to identify in our case we want to identify person and organization next we're going to create a doc container and pass our text through this NLP pipeline we're using a very simple text Bill Gates founded Microsoft next we're going to iterate over the entities that are found in this pipeline by saying for in and doc. inss and printing off end. text and end. label and this is in real time we immediately have our results Bill Gates is person and Microsoft is organization but what if we wanted to have something that's a bit more complex something that's more domain specific let's use the example that I oftentimes have in my research alitz was a concentration camp with most models you're going to see something like this organization but what if we want to use a custom label let's go ahead and add in a custom label of concentration under underscore camp this is going to allow us to use the zero shot model to automatically identify alitz as a concentration camp not as an organization and after just a few seconds of reloading at the pipeline we can execute it and we have the expected results alitz is correctly identified as a concentration camp is this model perfect absolutely not it's meant to be a first step in doing Neer on much more complex data imagine that you don't have any training data and you want to start to cultivate it you can use zero shot models to help cultivate large quantities of training data relatively quickly and efficiently by loading it up in something like Prodigy and manually correcting the outputs not only is this approach very quick and effective it's also very consistent which means it's quite reliable to use in production glander inter Spacey comes to us from their story which is currently testing its use on real world data that sits in archives which is all meant to be manually validated through a pipeline for doing things like named and to do recognition summarization and also categorization all with oral testimonies if you test out this library and this approach let me know in the comments down below what your success rate was and some of the challenges that you encountered I'm particularly interested in examples where it didn't necessarily work well or it worked particularly well oh
Info
Channel: Python Tutorials for Digital Humanities
Views: 4,415
Rating: undefined out of 5
Keywords: GLiNER tutorial, spaCy NLP, named entity recognition, zero-shot learning, zero-shot NER, natural language processing, NLP tutorial, text analysis, data science, machine learning, SpaCy Python, GLiNER spaCy wrapper, advanced NER techniques, NER algorithms, text mining, NER SpaCy integration, entity extraction, python ner, ner spacy, easiest way to do ner, best way to do ner, zero shot ner, simple ner, ner with spacy, spacy ner
Id: kPOtaXk-K-0
Channel Id: undefined
Length: 5min 0sec (300 seconds)
Published: Tue Mar 19 2024
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.