Is white space tokenization enough?

Let’s analyze the tokenization tool………

When adding the following sentence… “My name is Mckinzie Dotson and I am a freshman at Mount Saint Joseph University where I play soccer. But recently I haven’t been able to play soccer” to the tokenization tool we can observe many things. 

When asked if the spaces are sufficient to tokenize this  English language text, the answer is no. This is because things like contractions can not be tokenized. For example the word “haven’t” in my previous sentence the tool could not figure out what to do with the word. In one case it was tokened as “have”, “n”, ” ‘ “, “t”. 

 

Scroll to Top