With the answers in
- Javascript unicode string, chinese character but no punctuation and
- How can I strip all punctuation from a string in JavaScript using regex?
I have gotten close to what I need: get all the Chinese punctuations in the string.
And Intl.Segmenter is much better than String.prototype.split(" ")
But with one problem /p{P}/u.test(segment.segment)
test all the punctuations, not just Chinese punctuation, so I get English punctuation like apostrophe, comma, question mark and period.
I hope I need not to resolve to the answer in Chinese punctuation Unicode range?. It is too complicated. According to this wiki about Chinese punctuation there are only about 20.
So is there any easy way to do that ?
const str = "你好,让我们试试这个分词效果,你说怎么样?Let's try Intl.Segmenter, should we ?"
let segmenterZH = new Intl.Segmenter('zh', { granularity: 'grapheme' })
let segments = segmenterZH.segment(str)
for (let segment of segments) {
if (/p{P}/u.test(segment.segment)) {
console.log(`${segment.index}:${segment.segment}`)
}
}