You will often have strings that contain symbols, where you only want to extract just the text component of this string.
Using a regular expression (regex) in PHP, you can do this quite easily...
$string = "12345-hello, 汉语!!!";
$string = preg_replace('/[^(\p{L}\p{N}\s)]/u', '', $string);
print($string); // output is: 12345hello 汉语
Each part of the preg_replace here can be explained without much effort.
/regex/u - The two slashes indicate what is matched against. The U indicates UTF-8.
[class] - Within this, the two brackets indicate the one single class of values considered valid.
^(values) - We are looking for things that do not match values.
\p{L} - This value is all UTF-8 characters.
\p{N} - This value is all numbers.
\s - This value is all white spacing.
So, anything that doesn't match a letter, a number, or a space (i.e., /[^(\p{L}\p{N}\s)]/), is replaced with '', or the empty string.
No comments:
Post a Comment