- Download OpenRefine and Sublime Text
- Learn RegEx
- Use Sublime
- Use RegEx in Sublime
- Use OpenRefine
- Use RegEx in OpenRefine
Download these in the background while you are completing the next tasks
- Download OpenRefine for Windows here
- unzip, and double-click on openrefine.exe. If you’re having issues with the above, try double-clicking on refine.bat instead.
- Download OpenRefine for macOS here
- Open, drag icon into the Applications folder and double click on it.
- Download Sublime for Windows here
- Run the installer
- Download Sublime for macOS here
- Drag Sublime into your applications
Complete the RegEx tutorial here: https://regexone.com/
You can refer to http://www.regular-expressions.info/tutorial.html for more detailed information about regular expressions
You can practice your regular expressions here: https://regexr.com
- Download the zip file here
- Extract it into a new folder on your Desktop
- Open the folder in Sublime
After each task has been completed check that your solution is correct by comparing it with the answer file
- In
file1.csv
remove all the instances of the character.
- In
file2.csv
replace all the instances of the character.
with-
- In
file3.csv
find all the upper-case text, and make it lower case- Make sure your search is case-sensitive
- Try looking in the command palette
- macOS
⌘ + ⇧ + P
- Windows
Ctrl + ⇧ + P
- macOS
- In
file4.csv
replace all instances of the wordsone
,two
, andthree
with the numbers1
,2
, and3
respectively - In
file5.csv
replace all the repeating instances ofa
with a singlea
- Watch the introduction video here: http://www.youtube.com/watch?v=B70J_H_zAWM
- Download some data from Reaper. If you are having issues, download the sample file here
- Create a new project and import the file from Reaper
- Make some changes to the file
- Export the file as a csv
-
In
file6.csv
trim the whitespace in the first column and capitalise the second- Look in the
Common Transformations
menu
- Look in the
-
In
file7.csv
split the created date intocreated_year
created_month
created_day
andcreated_time
- Look in the
Edit Column
menu
- Look in the
-
In
file8.csv
split all the urls into a new column using this code:-
import re if value != None: return ",".join(re.findall('http[s]?://(?:[a-zA-Z]|[0-9]|[$-_@.&+]|[!*\(\),]|(?:%[0-9a-fA-F][0-9a-fA-F]))+', value))
-
Use
Add column based on this column
-
Change the language to
Python / Jython
-
-
In
file9.csv
split all the hashtags into a new column using this code:-
import re if value != None: return ",".join(re.findall(r"#(?:\[[^\]]+\]|\S+)", value))
-
Use
Add column based on this column
-
Change the language to
Python / Jython
-
-
In
file10.csv
fix the formatting issues in Sublime text, import it into OpenRefine