4

I am curious about the way VW appears to create interaction terms, through the -q parameter.

For the purpose of this illustration I am using this toy data, which is called cats.vm:

1 |a black |b small  green |c numvar1:1.62 numvar2:342 |d cat |e numvar3:554
1 |a white |b large yellow |c numvar1:1.212 numvar2:562 |d cat |e numvar3:632
-1 |a black |b small green |c numvar1:12.03 numvar2:321 |d hamster |e numvar3:754
1 |a white |b large green |c numvar1:5.8 numvar2:782 |d dog |e numvar3:234
-1 |a black |b small yellow |c numvar1:2.322 numvar2:488 |d dog |e numvar3:265
1 |a black |b large yellow |c numvar1:3.99 numvar2:882 |d hamster |e numvar3:543

There seems to be some inconsistency in the way VW creates interaction terms. Here are a couple examples, where the command is always the following, with only -q being changed:

vw -d cats.vm --loss_function logistic --invert_hash readable.cat.mod -q X

1. -q aa

Here we have an interaction within a namespace with only one feature and only get the quadratic terms for black and white (black^2 and white^2) as expected.

Constant:116060:0.082801
a^black:53863:-0.039097
a^black^a^black:247346:-0.039097
a^white:55134:0.223999
a^white^a^white:227140:0.223999
b^green:114666:0.027346
b^large:192199:0.330261
b^small:80587:-0.096200
b^yellow:255950:0.075754
c^numvar1:132428:0.004266
c^numvar2:30074:0.000211
d^cat:11261:0.188487
d^dog:173570:0.006734
d^hamster:247835:-0.085219
e^numvar3:12042:0.000115

2. -q ab

With interaction between 2 namespaces (one of which has more than 1 feature), things are as expected except there are no quadratic terms of items in either a or b (e.g. black*black)

Question 1: Is there a way to force these 'across namespace interactions' to include polynomials terms such as black*black?

Constant:116060:0.079621
a^black:53863:-0.035646
a^black^b^green:46005:-0.017797
a^black^b^large:123538:0.137239
a^black^b^small:11926:-0.088733
a^black^b^yellow:187289:-0.053135
a^white:55134:0.206693
a^white^b^green:24528:0.127449
a^white^b^large:102061:0.206693
a^white^b^yellow:165812:0.114003
b^green:114666:0.025218
b^large:192199:0.302959
b^small:80587:-0.088733
b^yellow:255950:0.072339
c^numvar1:132428:0.004038
c^numvar2:30074:0.000199
d^cat:11261:0.176863
d^dog:173570:0.007334
d^hamster:247835:-0.080986
e^numvar3:12042:0.000109

3. -q bb

Here we have interaction within a namespace where there are two features. There are duplicates (e.g. b^large^b^green:81557:0.112864 and b^green^b^large:110857:0.112864).

Question 2: Are these duplicated terms in the model or is this some issue in the --invert_hash? The weights are the same for all duplicates. Should we multiply green*large weight by 2, for example, in order to get the full effect of green and large interaction?

Constant:116060:0.062784
a^black:53863:-0.043486
a^white:55134:0.182450
b^green:114666:0.023035
b^green^b^green:33324:0.023035
b^green^b^large:110857:0.112864
b^green^b^small:261389:-0.016840
b^large:192199:0.252576
b^large^b^green:81557:0.112864
b^large^b^large:159090:0.252576
b^large^b^yellow:222841:0.187498
b^small:80587:-0.079945
b^small^b^green:249481:-0.016840
b^small^b^small:215402:-0.079945
b^small^b^yellow:128621:-0.123284
b^yellow:255950:0.051017
b^yellow^b^large:68957:0.187498
b^yellow^b^small:219489:-0.123284
b^yellow^b^yellow:132708:0.051017
c^numvar1:132428:0.003217
c^numvar2:30074:0.000164
d^cat:11261:0.158140
d^dog:173570:0.008735
d^hamster:247835:-0.085383
e^numvar3:12042:0.000086
smci
  • 26,085
  • 16
  • 96
  • 138
B_Miner
  • 1,635
  • 1
  • 23
  • 54

1 Answers1

6

First, the basics: when you cross features, vowpal wabbit uses:

  • For the crossed feature name/identity: the murmur32 hash (modulo weight-vector size) of the concatenated original feature names (strings)
  • For the crossed feature value: the crossing (multiplication) of the original feature values (weights)

So looking at your question #3 above: the concatenated names are b^green^b^large or b^large^b^green. They have the same value: 0.112864 since the multiplication of the two feature values is the same. However, because of the two possible ways to concatenate, we get two different hash values and a 'split' feature. This redundant (with transposed-order) feature-pair phenomenon seems to appear only in self crosses. I'm not sure why, and it may be a bug.

To answer the other questions (1, and 2):

To force black^black (actually ^a^black^a^black) you need to pass -q aa because black is only in name-space a.

Note that you can pass multiple -q arguments to vw to achieve any crossing you want:

-q aa -q ab -q ...

You can use the wildcard : name-space to cross every name space with every other:

-q ::

For more power:

There's also a --cubic option, allowing you to fit cubic-polynomials. --cubic takes 3 name-space leading chars as an argument, e.g. --cubic abc.

Finally, you may also use --keep and --ignore to keep or ignore name-spaces starting with a certain character.

arielf
  • 5,429
  • 1
  • 30
  • 45
  • As usual, Arief, great answer and much appreciated for your expertise. I am curious, WHY would the design of the interaction not just limit to green*large and exclude large*green altogether (or vice versa)? – B_Miner Sep 24 '14 at 00:08
  • (+1) for pointing out -q can be repeated, I was wondering about that. – B_Miner Sep 24 '14 at 00:09
  • So I was wrong: I see the redundant transposed feature name appears in `--audit` as well (I missed it), but only when self crossing the same name-space. I don't see it when crossing different name-spaces. It may be a bug, in fact. Sorry for my confusion. – arielf Sep 24 '14 at 01:08
  • Should I add it to the issue list you think? Specifically just the redundancy of features with -q bb? – B_Miner Sep 24 '14 at 02:03
  • 2
    Split features are handled correctly by virtue of vw's SGD, each "half-feature" should get half the weight. So in the end behaviour is correct albeit not the most elegant/minimalistic. In that sense my calling this "a bug" may be too strong. Feel free to open an issue if that bothers you. Whether John will be open to fixing this is another question; there's a trade-off between aiming for perfection vs increasing the complexity of the code. In any case, thanks for your questions. I'm learning from them too! – arielf Sep 24 '14 at 18:35
  • Yeah, I may mention it and judge the reaction :) It seems like for large text documents, this repetition could be very problematic from a computing time / storage perspective. – B_Miner Sep 24 '14 at 18:59
  • 1
    Note that "split" (i.e. duplicate) features sometimes can help because of hash collisions. If you have a big number of features and one of them is important, you can additional copies of it (e.g. three) to make sure that at least on index will be collision free. On the other hand, duplicating many (unimportant) features makes the collision problem even worse. – Martin Popel Sep 24 '14 at 21:33
  • 1
    Storage perspective: the size of the model is determined (and upper bounded) by -b (number of bits). Computing time perspective: excluding duplicate features may be even a bit slower than not excluding - benchmarking would be needed. – Martin Popel Sep 24 '14 at 21:35