The Natural Language Processing is a process of taking unstructured text and discerning some characteristics about it. The Natural language text is analyzed by the ‘NSLinguisticTagger’ class. The NLP is also used in Machine learning.
It helps to understand text using the following techniques:
- Tokenization
- Lemmatization (identification of the root form of a word)
- Parts of speech identification
- Named entity recognition (proper names of people, places, and organizations
To begin, create a new Xcode project and follows the following steps:
First, we will declare a variable to take a string.
1 |
let quote = "Here's to the crazy ones. The misfits. The rebels. The troublemakers. The round pegs in the square holes. The ones who see things differently. They're not fond of rules. And they have no respect for the status quo. You can quote them, disagree with them, glorify or vilify them. About the only thing you can't do is ignore them. Because they change things. They push the human race forward. And while some may see them as the crazy ones, we see genius. Because the people who are crazy enough to think they can change the world, are the ones who do. - Steve Jobs (Founder of Apple Inc.)" |
In Natural Language Processing, a tagger is basically a piece of software that can read text and “tag” various information to it such as part of speech, recognize names and languages, perform lemmatization, etc. We do this by calling the NSLinguisticTagger
The next step is to add the following line of code, the tagger, and the Options.
Now parsing text is tokenization. it is the process of splitting sentences, paragraphs, or documents. we’ll be splitting the quote above into words.
Let us create a method.
1 2 3 4 5 6 7 8 |
func tokenizeText(for text: String) { tagger.string = text let range = NSRange(location: 0, length: text.utf16.count) tagger.enumerateTags(in: range, unit: .word, scheme: .tokenType, options: options) { tag, tokenRange, stop in let word = (text as NSString).substring(with: tokenRange) print(word) } } |
The output of the following method is:-
1 2 3 4 5 6 7 8 9 10 11 12 |
Here 's to the crazy ones The misfits The rebels The troublemakers |
Next is ‘Lemmatization’. It breaks down the word into its most basic form.
1 2 3 4 5 6 7 8 9 |
func lemmatization(for text: String) { tagger.string = text let range = NSRange(location:0, length: text.utf16.count) tagger.enumerateTags(in: range, unit: .word, scheme: .lemma, options: options) { tag, tokenRange, stop in if let lemma = tag?.rawValue { print(lemma) } } } |
Another one is ‘Parts of Speech’. It identifies the part of speech of the sentence.
1 2 3 4 5 6 7 8 9 10 |
func partsOfSpeech(for text: String) { tagger.string = text let range = NSRange(location: 0, length: text.utf16.count) tagger.enumerateTags(in: range, unit: .word, scheme: .lexicalClass, options: options) { tag, tokenRange, _ in if let tag = tag { let word = (text as NSString).substring(with: tokenRange) print("\(word): \(tag.rawValue)") } } } |
The output of the following method is:-
1 2 3 4 5 6 7 8 9 10 11 12 13 |
The: Determiner troublemakers: Noun The: Determiner round: Noun pegs: Noun in: Preposition the: Determiner square: Adjective holes: Noun The: Determiner ones: Noun who: Pronoun see: Verb |
Now we look into Named Entity Recognition. It helps to identify any names, organizations, or places
1 2 3 4 5 6 7 8 9 10 11 |
func namedEntityRecognition(for text: String) { tagger.string = text let range = NSRange(location: 0, length: text.utf16.count) let tags: [NSLinguisticTag] = [.personalName, .placeName, .organizationName] tagger.enumerateTags(in: range, unit: .word, scheme: .nameType, options: options) { tag, tokenRange, stop in if let tag = tag, tags.contains(tag) { let name = (text as NSString).substring(with: tokenRange) print("\(name): \(tag.rawValue)") } } } |
Conclusion
So please follow the above step to integrate Natural Language Processing, and if you have any issue or suggestion you can leave your query/suggestion in the comment section.