0

We have discovered a problem whereby a UTF-8 include file is causing the ANSI file it is included in to become UTF-8.

Is there any reason for the include file to be UTF-8 or can it be safely changed to ANSI?

When I have tried changing it nothing obvious has broken, however the include file contains functions that are related to interactions with a web service.

I include the code of the include file:

<%   
Function GetQuotedUrl(ByVal value)
    GetQuotedUrl = Chr(34) & value & Chr(34)
End Function

Function GetServiceResponse(ByVal paramArr,ByVal methodName)

    dim soapMessage
    dim responseMessage

    soapMessage=CreateSOAPMessage(paramArr,methodName)      
    soapMessage = Replace(soapMessage, "'", chr(34))

    Set xmlhttp = CreateObject("MSXML2.ServerXMLHTTP")                                  
    xmlhttp.open "POST", SERVICE_URL , False
    xmlhttp.setTimeouts 30000, 60000, 30000, 120000
    xmlhttp.setRequestHeader "Man", POST & " " & SERVICE_URL & " HTTP/1.1"
    xmlhttp.setRequestHeader "SOAPAction", "http://tempuri.org/" &  SERVICE_CONTRACT & "/" & methodName
    xmlhttp.setRequestHeader "Content-Type", "text/xml; charset=utf-8"

    xmlhttp.send(soapMessage)
    responseMessage=xmlhttp.responseText        
    GetServiceResponse=responseMessage

End Function

Function CreateSOAPMessage(ByVal paramArr,ByVal methodName)

    dim soapMessage
    dim param
    dim paramName,paramValue
    dim paramNameValue

    For count=0 to UBound(paramArr)-1
      paramNameValue=Split(paramArr(count),"=")
      param = param & "<" & paramNameValue(0) & ">" & paramNameValue(1) & "</" & paramNameValue(0) & ">"
    Next

    soapMessage = "<s:Envelope xmlns:s=" & GetQuotedUrl("http://schemas.xmlsoap.org/soap/envelope/") & ">" & _ 
                    "<s:Body>" & _ 
                        "<" & methodName & " xmlns=" & GetQuotedUrl("http://tempuri.org/") & ">" & param & "</" & methodName & ">" & _
                    "</s:Body>" & _
                "</s:Envelope>"

   CreateSOAPMessage=soapMessage          

End Function       
%>
user692942
  • 14,779
  • 6
  • 66
  • 157
Dennis
  • 3
  • 4
  • I don't see any UTF-8 symbols in this file, which means that it uses sub-set which is the same for both (UTF/ASCII), so there is no difference, anyway I suggest you to use encoding declaration in header `` – Iłya Bursov Mar 18 '16 at 14:27
  • 2
    You shouldn't mismatch encodings it just leads to problems that you spend a lot time trying to track down. If the source `asp` file is `UTF-8` then any `#include` files should also be `UTF-8` or vice versa. – user692942 Mar 18 '16 at 14:29
  • See http://stackoverflow.com/a/21914278/692942 – user692942 Mar 18 '16 at 15:22

1 Answers1

0

There doesn't appear to be any characters outside of the 7-bit ASCII range in the file, and since ANSI and UTF-8 share the same definition in this range of characters, it shouldn't be a problem. Where you would see some difficulty is if your file contained even characters as innocent as a Euro symbol, "curly" quotes, or em dash. The mapping of these characters from values to glyphs is different between ANSI and UTF-8 so you couldn't simply say "the file is ANSI" and get away with it.

Craig
  • 3,133
  • 5
  • 27
  • 41
  • It's very unlikely the file is actual 7 bit ASCII it's more likely extended ASCII 8 bit character set like Windows-1252, so not all the characters will map like for like, simply don't mix encodings it never ends well. See [What is ANSI Format?](http://stackoverflow.com/a/701920/692942) – user692942 Mar 18 '16 at 22:50
  • But the question was about this particular file, and my answer was about this particular file. Files don't have inherent encodings. If the characters in a particular file happen to fall into two encodings, then you can claim it is either one. The OP won't be "mixing encodings" because, again, files don't have inherent encodings. – Craig Mar 25 '16 at 13:00
  • ofc a file has an inherit encoding, they determine how the binary is structured in the file. If you try open a file without the correct encoding you will end up with an encoding mismatch, I'm not sure I understand your reasoning. ASCII and UTF-8 do share *some* of the same range so you can in theory safely show ASCII in UTF-8 but expect a world of pain if you try to over way around. – user692942 Mar 25 '16 at 13:34
  • My point is this: If a file (such as that quoted by the OP) contains only those characters defined by "7-bit ASCII" (which his does), then it is safe to say that the file uses ANSI encoding (which is what he wants to do). The reason this works is that, despite your pleas to the contrary, files DO NOT have some kind of hidden encoding property that will jump up and surprise him. He can call this file Windows 1252, ANSI, UTF-8, or ASCII (7- or 8-bit) and the results will be the same. This of course is not true of EVERY file, just this one. And that was the question, so the answer is YES. – Craig Mar 25 '16 at 14:50
  • Wow interesting use of UPPER CASE @craig. Your point is flawed because you are still making assumptions about the dynamically generated file. You can argue the point all you want but nothing good comes from mismatching encoding. What happens it one of the method names contains a character that doesn't match the ASCII encoding you'll end up with corrupted data in the response XML? – user692942 Mar 25 '16 at 20:05
  • Again, read the original question and look at the file the OP is asking about. Point out the character in a method name that is different between UTF-8 and ANSI. If you can't find one, then my answer is correct. And as far as your point about the general idea of arbitrarily switching between ANSI and UTF-8 encoding without considering the contents of the file, you and I are and have always been in agreement. That, however, was not the question, hence my (correct) answer. – Craig Mar 28 '16 at 14:43