1

I have a Node.js application using Node REST Client to make an HTTP GET request to a server, targeting a file in JSON format. Everything goes well when this file is encoded in UTF-8 without BOM.

However, the app crashes during the client.get call when the target file encoding is UTF-8 with BOM. Even when I wrap that call in a try / catch in an attempt to prevent the crash and get the error, I get this stacktrace:

events.js:188
      throw err;
      ^

Error: Unhandled "error" event. (Error parsing response. response: [{}], error: [SyntaxError: Unexpected token  in JSON at position 0])
    at exports.Client.emit (events.js:186:19)
    at C:\PFD\workspace\web_adherent\dev\eamnh-front\node_modules\node-rest-client\lib\node-rest-client.js:457:57
    at Object.parse (C:\PFD\workspace\web_adherent\dev\eamnh-front\node_modules\node-rest-client\lib\nrc-parser-manager.js:140:17)
    at ConnectManager.handleResponse (C:\PFD\workspace\web_adherent\dev\eamnh-front\node_modules\node-rest-client\lib\node-rest-client.js:538:32)
    at ConnectManager.handleEnd (C:\PFD\workspace\web_adherent\dev\eamnh-front\node_modules\node-rest-client\lib\node-rest-client.js:531:18)
    at IncomingMessage.<anonymous> (C:\PFD\workspace\web_adherent\dev\eamnh-front\node_modules\node-rest-client\lib\node-rest-client.js:678:34)
    at emitNone (events.js:111:20)
    at IncomingMessage.emit (events.js:208:7)
    at endReadableNT (_stream_readable.js:1064:12)
    at _combinedTickCallback (internal/process/next_tick.js:139:11)

What the code block doesn't show here that IntelliJ does is the U+FEFF zero width no-break space Unicode code point, marked by < X > in the following stack trace line: Error: Unhandled "error" event. (Error parsing response. response: [< X >{}], error: [SyntaxError: Unexpected token < X > in JSON at position 0]). So what sems to happens is that the Client is trying to read the file content as a Unicode encoded String, instead of an UTF-8 JSON with no BOM. So it thinks the BOM is the U+FEFF Unicode character.

I have scoured SO and found quite a few questions about setting mimetypes for the Client but I still get the error. I have also read the node-rest-client docs and it seems that setting a response parser would be the way to go but scrolling to JSON parser shows that it is the same thing as setting mimetypes.

So I ended up with this:

const options ={
    mimetypes:{
        json:["application/json","application/json; charset=utf-8","application/json;charset=utf-8"]
    }
};
const client = new Client(options);

Trying to set the charset to UTF-8 but the error is the same.

Does someone know what I am doing wrong or is this an issue with Node REST Client?

Thank you for your help.

-- Edit This is my code for the GET request function:

let Client = require('node-rest-client').Client;

const options ={
    mimetypes:{
        json:["application/json","application/json; charset=utf-8","application/json;charset=utf-8"]
    }
};
const client = new Client(options);

// Reads file contents and calls callback function with data
exports.readFromUrl = (req, fileUrl, callback) => {

    client.get(fileUrl, (data, resp) => {

        if (resp.statusCode === 200) {

            callback(data);

        } else {

            callback("");
        }
    }).on('error', (err) => {

        callback("");
    });
};

Final solution:

Just in case someone stumbles here because of a similar issue, I ended up replacing the Node REST Client JSON parser with a custom one which filters out invalid characters to pass a valid JSON to the callback.

Here's how I did it (using docs previously mentionned).

const Client = require('node-rest-client').Client;
const client = new Client();

// Remove existing regular parsers (otherwise JSON parser still gets called first)
client.parsers.clean();

client.parsers.add({
    "name": "cleanInput",
    "isDefault": false,
    "match": function (response) {

        // Match evey response to replace default parser
        return true;
    },
    "parse": function (byteBuffer, nrcEventEmitter, parsedCallback) {

        let parsedData = null;

        try {

            const cleanData = cleanString(byteBuffer.toString());

            parsedData = JSON.parse(cleanData);
            parsedData.parsed = true;

            // Emit custom event
            nrcEventEmitter('parsed', 'Data has been parsed ' + parsedData);

            // Pass parsed data to client request method callback
            parsedCallback(parsedData);

        } catch(err) {

            nrcEventEmitter('error', err);
        }
    }
});

// Only keeps unicode characters with codes lesser than 127 to avoid forbidden characters in JSON
function cleanString(input) {

    let output = "";

    for (let i=0; i < input.length; i++) {

        if (input.charCodeAt(i) < 127) {

            output += input.charAt(i);
        }
    }
    return output;
}
Batman
  • 212
  • 2
  • 12
  • can't you just replace all BOM in received strings before parsing? there are libraries on NPM for this purpose – Loveen Dyall May 31 '19 at 11:23
  • I should indeed be able to create a custom parser that will use a library to replace the BOM, I'll try that right now. But with the client's default JSON parser showing "application/json; charset=utf-8" in the mimetype, I don't understand why that doesn't work out of the box (which is why I haven't tried a custom parser yet). Thanks for the suggestion! – Batman May 31 '19 at 11:31
  • https://stackoverflow.com/a/38036753/7316335 according to this JSON parsers do not accept BOM. so the API is crashing due to malformed requests from the client – Loveen Dyall May 31 '19 at 11:33

1 Answers1

1

https://stackoverflow.com/a/38036753/7316335

JSON parsers are specified to NOT accept Byte-Order marks.

Hence, your server is crashing due to a 'malformed' client GET request.

The issue should be resolved in your server's processing of the GET request, and not through a change in the JSON parser specification.

I would advise to filter Byte Order marks in all GET requests before parsing at the server.

in express how multiple callback works in app.get

This shows you how a single middleware can perform your pre-filtering of GET bodies before passing over to the actual callback for that GET path.

Loveen Dyall
  • 704
  • 1
  • 6
  • 19
  • Nice find, thanks. I'm not sure how I would filter the GET response without changing the parser though. Because as far as I'm aware, with Node REST Client's get method, I can't alter the response before it's parsed by the library and passed back to me in the callback. I'll investigate, thanks – Batman May 31 '19 at 11:46
  • Can you show the parser library used and the GET request function code? – Loveen Dyall May 31 '19 at 11:48
  • If you're using something like body-parser, just receive RAW text, filter the BOM then pass to JSON.parse(). Maybe abstracting to a helper function is best here – Loveen Dyall May 31 '19 at 11:52
  • If you mean the parser used by Node REST Client's get function, it's defined in the nrc-parser-manager.js file we can see in the stack trace. I'll edit the question with my GET request function – Batman May 31 '19 at 11:58
  • Try looking at the serializer API to chain your GET request processing. The node-rest-client docs aren't too clear if this will work but it seems promising – Loveen Dyall May 31 '19 at 12:06