14

How would you extract the Server Name Indication from a TLS Client Hello message. I'm curently struggling to understand this very cryptic RFC 3546 on TLS Extensions, in which the SNI is defined.

Things I've understood so far:

  • The host is utf8 encoded and readable when you utf8 enocde the buffer.
  • Theres one byte before the host, that determines it's length.

If I could find out the exact position of that length byte, extracting the SNI would be pretty simple. But how do I get to that byte in the first place?

buschtoens
  • 6,727
  • 8
  • 31
  • 56
  • 3
    The straightforward approach you attempt to take is wrong. You need to parse the request including its extensions and then you get the data from the corresponding extension. – Eugene Mayevski 'Callback Jul 24 '13 at 15:20
  • Yeah, I'm certain about that, but I don't actually know how to parse it. Do you understand how the TLS handshake works? – buschtoens Jul 25 '13 at 12:49
  • Sure, I do as we offer a security library as one of our main products. You need to open the RFC ( http://tools.ietf.org/html/rfc5246 ) and implement it. – Eugene Mayevski 'Callback Jul 25 '13 at 13:16
  • Haha, well thank you, that's like 100 pages of pure tech. I guess things start getting interesting on [page 41](http://tools.ietf.org/html/rfc5246#page-41). This is where the extensions are mentioned, that in return are described in RFC 3546. Oh my, Oh my. :D – buschtoens Jul 25 '13 at 14:13
  • 4
    Hey look at that, another answer where Eugene is selling a 20 thousand dollar product as an answer. I guess by pointing people to a huge RFC and making the task feel overwhelmingly daunting is a sales tactic? –  Oct 24 '14 at 03:17
  • Related: https://serverfault.com/questions/574405/tcpdump-server-hello-certificate-filter – sanmai May 04 '18 at 00:15
  • https://stackoverflow.com/questions/39624745/capture-only-ssl-handshake-with-tcpdump – sanmai May 04 '18 at 00:16

4 Answers4

38

I did this in sniproxy, examining a TLS client hello packet in Wireshark while reading that RFC is a pretty good way to go. It's not too hard, just lots of variable length fields you have to skip past and check checking if you have the correct element type.

I'm working on my tests right now, and have this annotated sample packet that might help:

const unsigned char good_data_2[] = {
    // TLS record
    0x16, // Content Type: Handshake
    0x03, 0x01, // Version: TLS 1.0
    0x00, 0x6c, // Length (use for bounds checking)
        // Handshake
        0x01, // Handshake Type: Client Hello
        0x00, 0x00, 0x68, // Length (use for bounds checking)
        0x03, 0x03, // Version: TLS 1.2
        // Random (32 bytes fixed length)
        0xb6, 0xb2, 0x6a, 0xfb, 0x55, 0x5e, 0x03, 0xd5,
        0x65, 0xa3, 0x6a, 0xf0, 0x5e, 0xa5, 0x43, 0x02,
        0x93, 0xb9, 0x59, 0xa7, 0x54, 0xc3, 0xdd, 0x78,
        0x57, 0x58, 0x34, 0xc5, 0x82, 0xfd, 0x53, 0xd1,
        0x00, // Session ID Length (skip past this much)
        0x00, 0x04, // Cipher Suites Length (skip past this much)
            0x00, 0x01, // NULL-MD5
            0x00, 0xff, // RENEGOTIATION INFO SCSV
        0x01, // Compression Methods Length (skip past this much)
            0x00, // NULL
        0x00, 0x3b, // Extensions Length (use for bounds checking)
            // Extension
            0x00, 0x00, // Extension Type: Server Name (check extension type)
            0x00, 0x0e, // Length (use for bounds checking)
            0x00, 0x0c, // Server Name Indication Length
                0x00, // Server Name Type: host_name (check server name type)
                0x00, 0x09, // Length (length of your data)
                // "localhost" (data your after)
                0x6c, 0x6f, 0x63, 0x61, 0x6c, 0x68, 0x6f, 0x73, 0x74,
            // Extension
            0x00, 0x0d, // Extension Type: Signature Algorithms (check extension type)
            0x00, 0x20, // Length (skip past since this is the wrong extension)
            // Data
            0x00, 0x1e, 0x06, 0x01, 0x06, 0x02, 0x06, 0x03,
            0x05, 0x01, 0x05, 0x02, 0x05, 0x03, 0x04, 0x01,
            0x04, 0x02, 0x04, 0x03, 0x03, 0x01, 0x03, 0x02,
            0x03, 0x03, 0x02, 0x01, 0x02, 0x02, 0x02, 0x03,
            // Extension
            0x00, 0x0f, // Extension Type: Heart Beat (check extension type)
            0x00, 0x01, // Length (skip past since this is the wrong extension)
            0x01 // Mode: Peer allows to send requests
};
dlundquist
  • 789
  • 7
  • 13
  • This is clearly more elaborate than my original half-assed answer. Have a tick. :D – buschtoens Mar 29 '15 at 01:14
  • Great, I came here because I wanted to have a non-decrypting simple TLS forwarder based on the SNI. So with sniproxy that is already done. – JanKanis Jul 12 '16 at 16:22
7

Use WireShark and capture only TLS (SSL) packages by adding a filter tcp port 443. Then find a "Client Hello" Message. You can see its raw data below.

Expand Secure Socket Layer->TLSv1.2 Record Layer: Handshake Protocol: Client Hello->...
and you will see Extension: server_name->Server Name Indication extension. The server name in the Handshake package is not encrypted.

http://i.stack.imgur.com/qt0gu.png

  • 3
    We're looking for a programmatic way to determine the SNI. However, this might be interesting for some folks nevertheless, so please don't delete this. – buschtoens Mar 29 '15 at 01:13
3

For anyone interested, this is a tentative version of the C/C++ code. It has worked so far. The function returns the position of the server name in a the byte array containing the Client Hello and the length of the name in the len parameter.

char *get_TLS_SNI(unsigned char *bytes, int* len)
{
    unsigned char *curr;
    unsigned char sidlen = bytes[43];
    curr = bytes + 1 + 43 + sidlen;
    unsigned short cslen = ntohs(*(unsigned short*)curr);
    curr += 2 + cslen;
    unsigned char cmplen = *curr;
    curr += 1 + cmplen;
    unsigned char *maxchar = curr + 2 + ntohs(*(unsigned short*)curr);
    curr += 2;
    unsigned short ext_type = 1;
    unsigned short ext_len;
    while(curr < maxchar && ext_type != 0)
    {
        ext_type = ntohs(*(unsigned short*)curr);
        curr += 2;
        ext_len = ntohs(*(unsigned short*)curr);
        curr += 2;
        if(ext_type == 0)
        {
            curr += 3;
            unsigned short namelen = ntohs(*(unsigned short*)curr);
            curr += 2;
            *len = namelen;
            return (char*)curr;
        }
        else curr += ext_len;
    }
    if (curr != maxchar) throw std::exception("incomplete SSL Client Hello");
    return NULL; //SNI was not present
}
Æðelstan
  • 742
  • 9
  • 22
2

I noticed that the domain is always prepend by two zero bytes and one length byte. Maybe it's unsigned 24 bit integer, but I can't test it, as my DNS server won't allow domain names beyond 77 characters.

Upon that knowledge I came up with this (Node.js) code.

function getSNI(buf) {
  var sni = null
    , regex = /^(?:[a-z0-9-]+\.)+[a-z]+$/i;
  for(var b = 0, prev, start, end, str; b < buf.length; b++) {
    if(prev === 0 && buf[b] === 0) {
      start = b + 2;
      end   = start + buf[b + 1];
      if(start < end && end < buf.length) {
        str = buf.toString("utf8", start, end);
        if(regex.test(str)) {
          sni = str;
          continue;
        }
      }
    }
    prev = buf[b];
  }
  return sni;
}

This code looks for a sequence of two zero bytes. If it finds one, it assumes the following byte is a length parameter. It checks if the length is still in the boundary of the buffer and if so reads the byte sequence as UTF-8. Later on, one could RegEx the array and extract the domain.

Works amazingly well! Still, I noticed something odd.

'�\n�\u0014\u0000�\u0000�\u00009\u00008�\u000f�\u0005\u0000�\u00005�\u0007�\t�\u0011�\u0013\u0000E\u0000D\u0000f\u00003\u00002�\f�\u000e�\u0002�\u0004\u0000�\u0000A\u0000\u0005\u0000\u0004\u0000/�\b�\u0012\u0000\u0016\u0000\u0013�\r�\u0003��\u0000\n'
'\u0000\u0015\u0000\u0000\u0012test.cubixcraft.de'
'test.cubixcraft.de'
'\u0000\b\u0000\u0006\u0000\u0017\u0000\u0018\u0000\u0019'
'\u0000\u0005\u0001\u0000\u0000'

Always, no matter what subdomain I choose, the domain is targeted twice. It seems like the SNI field is nested inside another field.

I am open to suggestions and improvements! :)

I turned this into a Node module, for everyone, who cares: sni.

buschtoens
  • 6,727
  • 8
  • 31
  • 56
  • 2
    I don't think regular expressions are the best way to extract data from a binary cryptographic protocol. The Client Hello message includes 32 bytes of random data that might match your regexp. – dlundquist Feb 21 '14 at 06:35
  • I don't know that it deserves a downvote, I mean he found a solution. I have come across the same but as dlundquist notes, I'm not gonna rely on it being consistent or rule out the possibility of random bytes polluting the regex match. It does however work. –  Oct 24 '14 at 07:09