Hello, you will now learn about two major concepts of preprocessing. The first thing you'll learn about is called stemming and the second thing you will learn about is called stop words, and specifically you will learn how to use stemming and stop words to preprocess your texts. Let's take a look at how you can do this. Let's process this tweet. First, I remove all the words that don't add significant meaning to the tweets, aka stop words and punctuation marks. In practice, you would have to compare your tweet against two lists. One with stop words in English and another with punctuation. These lists are usually much larger, but for the purpose of this example, they will do just fine. Every word from the tweet that also appears on the list of stop words should be eliminated. So you'd have to eliminate the word and, the word are, the word a, and the word at. The tweet without stop words looks like this. Note that the overall meaning of the sentence could be inferred without any effort. Now, let's eliminate every punctuation mark. In this example, there are only exclamation points. The tweet without stop words and punctuation looks like this. However, note that in some contexts you won't have to eliminate punctuation. So you should think carefully about whether punctuation adds important information to your specific NLP task or not. Tweets and other types of texts often have handles and URLs, but these don't add any value for the task of sentiment analysis. Let's eliminate these two handles and this URL. At the end of this process, the resulting tweets contains all the important information related to its sentiment. Tuning GREAT AI model is clearly a positive tweet and a sufficiently good model should be able to classify it. Now that the tweet from the example has only the necessary information, I will perform stemming for every word. Stemming in NLP is simply transforming any word to its base stem, which you could define as the set of characters that are used to construct the word and its derivatives. Let's take the first word from the example. Its stem is done, because adding the letter e, it forms the word tune. Adding the suffix ed, forms the word tuned, and adding the suffix ing, it forms the word tuning. After you perform stemming on your corpus, the word tune, tuned, and tuning will be reduced to the stem tun. So your vocabulary would be significantly reduced when you perform this process for every word in the corpus. To reduce your vocabulary even further without losing valuable information, you'd have to lowercase every one of your words. So the word GREAT, Great and great would be treated as the same exact word. This is the final preprocess tweet as a list of words. Now that you're familiar with stemming and stop words, you know the basics of texts processing. In the next video, you can use your process suite function to extract a matrix x, which represents all the tweets in your data set.