Scenario:
The data contains Address column that has Name, City, Email, Contact Number, etc. separated with special characters.
Requirement:
Split the column using Regex to generate different columns for Name, City, Contact Number, etc.
Sample Data
In the below given dataset, the Address column values contain Name, City, State, etc. in a single line with different separators such as ‘,’, ‘;’, ‘-’, etc.
For example, values like “Name: Mike Pelletier, City- Laguna Niguel State; California Contact No: +1 9410004015”.
The different parts of this value are;
Name: Mike Pelletier,
City- Laguna Niguel
State; California
Contact No: +1 9410004015
The separators used to split the values are “:, ,, -, :”.
Split the Name from the Address column
Name is given with different separators like {;,:,-,,} as shown below. We will be splitting the name and creating a column with a single regex expression here.
1. To split the Name column, right click on the Address column - >Split ->Split to Column
Split Column panel will be displayed as below.
SPLIT COLUMN PANEL
2. Select Regex option. Give the Expression as “[\,\;\-]”. Provide new column name and Click APPLY.
The best fit expression that is used to fulfil our requirement is [\;\,\-].
\; - This will match “;” character.
\, - This will match “,” character.
\- - This will match “-” character.
The result will be split to a new Name column from where the regex expression has matched. As per the regex expression given, the first matching character was a Comma (,) which occurred after the Name field in the string. So the name column was extracted from the whole line;
Name: Mike Pelletier.
RESULT WITH COLUMN NAME VALUES
To remove the characters “Name:” from the column value of NAME_H, use the same regex expression (“[\,\;\-]”) again to split.
The final result will have extracted name values as below. Do the same steps for extracting other columns like City, State, Contact number, etc.
RESULT WITH NAME VALUES EXTRACTED
Similarly for extracting Contact number follow the below steps.
Split the Contact number from the Address column
ADDRESS COLUMN HAVING CONTACT NO VALUES
Address values given are like “Name: Mike Pelletier, City- Laguna Niguel State; California Contact No: +1 9410004015”.
We will be splitting the contact number (+1 9410004015) and creating a column with a single regex expression - [+][0-9]{1} ?[0-9].
SPLITTING BY REGEX
The result would split the contact number to a different column.
DATASET AFTER SPLITTING MOBILE NUMBER COLUMN
Basics of Regex expression engine and examples:
Basically a regex-directed engine processes regular expressions, trying to match the pattern to the given string. If a match is found, it advances through the regex and the subject string. If the characters fail to match, it picks up the next position in the regex and the subject string where it can try a different path through the regex.
Some examples are given as below.
String example | Regex Expression | Result | Logic |
Name: Mike Pelletier, City- Laguna Niguel State; California Contact No: + 9410004015 | [\,\;\-] (From Left) | Name: Mike Pelletier | The string will be split from the Comma (,) as per the regex rule. |
Name: Mike Pelletier, City- Laguna Niguel State; California Contact No: + 9410004015 | [+][0-9]{1} ?[0-9] (From Right) | 9410004015 | Matches zero or more characters from the set 0123456789 |
[email protected], City: Laguna Niguel, Contact No: +91 9410000157 | [A-Za-z0-9]{1,}@[a-zA-Z]{5,}.[a-zA-Z]{2,} | 1. Matches one character from the English alphabet (both cases), digits, “+”, “_”, “.” and, “-” , 2. @ symbol, 3. Matches one character from the English alphabet (both cases), digits, “.” and “–” after the @ symbol. |
Note: This article is based on Smarten Version 5.0. This may or may not be relevant to the Smarten version you may be using.
Was this article helpful?
That’s Great!
Thank you for your feedback
Sorry! We couldn't be helpful
Thank you for your feedback
Feedback sent
We appreciate your effort and will try to fix the article