Sunday, February 01, 2015

Group Feature selection -a Binary Way


Suppose there are T distinct types and each of these types is labelled as 1,2,3,4,....,T. Let any of these types be represented as t.
And suppose we want to have various groups and each group is a combination of types. There are G groups in total and each of these groups can be labelled as g = 1,2,3,....G.

Each group can be composed of types as the following example:
g=1 has types  {2,3}
g=2 has types {1,2,3}
g=3 has types {1,2}

Then suppose that for each group g, it is required to randomly select a type which belong to the group.
Here is a method that is designed to be general and although may not be the the most efficient, should be efficient enough for small T.
The types are represented as binary as follows. Example with T=3
t=1 -> 001
t=2 -> 010
t=3 -> 100

The groups can be represented by combining the binary representation for the types. So using the example of the groups above, if g=1 has types {2,3}, then the group is the binary sum of the types:
   g=1 -> 010 +100 = 110
   g=2 -> 001 + 010 +100 = 111
   g=3 -> 001 + 010 = 011

Now for a particular group, it is required to select types which belong to that group. However for general efficiency, it is desired NOT to distinguish between any groups while selecting the random types. So there will be some wastage in this procedure, but it should be fast.

So here is the general random selection process to select any type t from 1....T. Let r be a random integer number from 1....T. Construct a binary representation of the random number, such that:  b=2^(r-1).
The table below shows the representation for various random numbers:
r=1: b=0 -> 001
r=2: b=1 -> 010
r=3: b=2 -> 100

To apply the random selection, simply apply the AND operator on r and g. So using the example of g=1->110, the combination with each of the random numbers would yield the following:
g->110 AND r->001 => 000 no type is selected
g->110 AND r->010 => 010 type 2 is selected
g->110 AND r->100 => 100 type 3 is selected

This algorithm can of course be made more efficient, but that is left for another bright person to accomplish. The present algorithm is efficient already due to use of simple random integers and binary operations and avoiding the use of decision logic like if-else.

No comments: