UPDATE: Pass this:
#",(?=([^\"]*\"[^\"]*\")*[^\"]*$)"
to clojure.string/split
to parse CSV.
UPDATE: I need a regex that matches all commas that are not in quotes, in a form that can be used by clojure.string/split
.
I have written a CSV parse function in Clojure:
(defn parse-csv [data schema]
(let [split-data (clojure.string/split data #",")]
(loop [rm-data split-data
rm-keys (:keys schema)
rm-trans (:trans schema)
final {}]
(if (empty? rm-keys)
final
(recur (rest rm-data)
(rest rm-keys)
(rest rm-trans)
(into final
{(first rm-keys)
((first rm-trans) (first rm-data))}))))))
schema
is simply a hash map consisting of a list of keywords and a list of functions (which are applied to their respective values). This is used to define how the output hash map will look.
Here's an example:
(def schema {:keys [:foo :bar :baz] :trans [identity read-string identity]})
(parse-csv "Hello,42,world" schema) ;; returns {:foo "Hello", :bar 42, :baz "world"}
However, if we do this:
(def schema {:keys [:foo :bar :baz] :trans [identity identity identity]})
(parse-csv "Hello,\"Newell, Gabe\",world" schema) ;; returns {:foo "Hello" :bar "\"Newell" :baz "Gabe\""}
Things get messed up, and the word "world" is ignored. The result should look like:
{:foo "Hello" :bar "\"Newell, Gabe\"" :baz "world"}
The above data, in a file, would actually look like Hello,"Newell, Gabe",world
, so we need to avoid triggering the split
function when it comes across the comma in "Newell, Gabe"
.
We need a function that will split a string by a certain character unless the certain character is in quotes.