Questions tagged [openrefine]

OpenRefine is the new name for the data cleaning tool which used to be called Google Refine (and was born as Freebase Gridworks)

Resources

337 questions
9
votes
2 answers

How to perform approximate (fuzzy) name matching in R

I have a large data set, dedicated to biological journals, which was being composed for a long time by different people. So, the data are not in a single format. For example, in the column "AUTHOR" I can find John Smith, Smith John, Smith J and so…
group413
  • 119
  • 1
  • 4
9
votes
2 answers

Grel to apply to ALL columns or current column

I have a transposition that I'd like to apply to multiple columns. The Grel generated shows the columnName or Base name, but that means I have to edit the code for each column. Thought there was a way to find the column index and have code that…
Sonicthoughts
  • 550
  • 1
  • 3
  • 16
8
votes
1 answer

Use POST method with URL and Google Refine/ OpenRefine

OpenRefine http://openrefine.org/ allows URL generation using GREL as tokens. I want to connect to an API which only supports a POST method . Can I format the URL so it calls the REST API using POST? Ref:…
Sonicthoughts
  • 550
  • 1
  • 3
  • 16
7
votes
1 answer

Value.match() Regex in Google Refine

I am trying to extract a sequence of numbers from a column in Google Refine. Here is my code for doing it: value.match(/[\d]+/)[0] The data in my column is in the format of abcababcabc 1234566 abcabcbacdf The results is "null". I have no idea…
mchangun
  • 8,274
  • 18
  • 64
  • 94
6
votes
2 answers

Split multi valued cells in more than one column into rows (Open Refine)

I have been cleaning a table on Open Refine. I now have it like this: REF Handle Size Price 2002, 2003 t-shirt1 M, L 23 3001, 3002, 3003 t-shirt2 S, M, L 24 I need to split those multivalued…
AnaRita
  • 127
  • 1
  • 11
6
votes
1 answer

openrefine flag changed rows

I'm using openrefine to cleanup an excel data set. I have about 70 operations and I've been cutting and pasting on different data sets. I maintain a record id and export to a new excel sheet. Then I reload the sheet using the record id. It works…
Sonicthoughts
  • 550
  • 1
  • 3
  • 16
5
votes
1 answer

Replace null values in cell

I am unable to replace null values in cells. I have created a facet to only display cells that have null values. I then went to edit cells > Transform function and tried to use the replace function but it does not seem to be working. Different…
Chris Smith
  • 359
  • 1
  • 16
5
votes
2 answers

Extract a html tag that contains a string in openrefine?

There is not much to add to the title. It's what i'm trying to do. Any suggestions? I reviewed the docs at github and googled extensively. The best i got is: value.parseHtml().select('p[contains('xyz')]') It results in a syntax error.
treakec
  • 129
  • 1
  • 9
5
votes
2 answers

Searching and replacing multiple values in Google Refine

I'd like to search and replace multiple values in a column with a single function with GREL (or anything other) in Google Refine. For example: 1. replace(value, "Buch", "bibo:Book") 2. replace(value, "Zeitschrift", "bibo:Journal") 3. replace(value,…
CH_
  • 625
  • 1
  • 7
  • 18
5
votes
1 answer

Google Refine: iterate over a JSON dictionary

I've got some JSON within Google Refine - http://mapit.mysociety.org/point/4326/0.1293497,51.5464828 for the full version, but abbreviated it's like this: {1234: {'name': 'Barking', 'type': 'WMC'}, 5678: {'name': 'England', 'type': 'EUR'} } I only…
Dragon
  • 1,743
  • 1
  • 16
  • 29
4
votes
2 answers

How to facet multiple columns in Google Refine

I have a data set with 30 columns and multiple rows (some cells have no data). I would like to be able to facet the columns in groups. 1 2 3 4... Row1 A B C D Row2 E A D F Row3 Q A B H Given the above data I would like the facet to retun…
banjanxed
  • 73
  • 2
  • 6
4
votes
2 answers

Openrefine: Split multi-valued cells by token/word count?

I have a large corpus of text data that I'm pre-processing for document classification with MALLET using openrefine. Some of the cells are long (>150,000 characters) and I'm trying to split them into <1,000 word/token segments. I'm able to split…
DFM
  • 43
  • 4
4
votes
1 answer

Parse multivalued JSON in GREL (OpenRefine)

I have a column with the following content: 7. {"resource":"abc"} 8. [{"resource":"def"},{"resource":"ghi"}] I try to get the content of "resource": value.parseJson().resource Works. If I try to get the content of multivalued cells, I can't get it…
CH_
  • 625
  • 1
  • 7
  • 18
4
votes
2 answers

Best way to parse a big and intricated Json file with OpenRefine (or R)

I know how to parse json cells in Open refine, but this one is too tricky for me. I've used an API to extract the calendar of 4730 AirBNB's rooms, identified by their IDs. Here is an example of one Json file :…
Ettore Rizza
  • 2,670
  • 2
  • 8
  • 22
4
votes
1 answer

Simple OpenRefine IF to create a new column

Im trying to create a new column which contains true or false. Basically column A has a number in it, between 1 and 6, if its higher than 3 I want the new column 'match' to contain true, otherwise it contains false. Using the add column based on…
Paul M
  • 3,475
  • 8
  • 38
  • 52
1
2 3
22 23