Improve the accuracy of the word breaking process for Japanese users in SharePoint People Search
Several of the Japanese users (ja) are being detected as Chinese (zh-tw) due to the user properties and they are being indexed incorrectly by the word breaker.
Japanese characters are very unique and often uses Chinese characters.
For example Japanese names often only uses Chinese characters.
When user profile is created in SharePoint Online with user's properties containing a lot of Chinese characters, they are detected as Chinese instead of Japanese.
As a result, their names and other properties are being tokenized in Chinese by the word breaker.
So when the user is searched by the name as a whole (Japanese way), they are not found in the People Search result.
For example, user profile with the following name is created.
Assuming the user profile's properties only contains Chinese characters, the user is detected and indexed as Chinese instead of Japanese.
Actual result: The name is indexed character by character "小" and "野" (Chinese)
Expected result: The name is indexed as a whole "小野" (Japanese)
Few Japanese users have confirmed this issue, and they would like the accuracy of the word breaker process could be improved when detecting Japanese .