The magic numbers show up when you do a rational function approximation to the residual ( log_2(1+z)-z ). 

Yes, the shift is to extract the manitissa: the exponent part of the representation is "easy to log".

Could you please explain what means those magic numbers ((1 << 23), 121.2740838f, 27.7280233f, 4.84252568f, 1.49012907f) in fast exp approximation?

I guess (1 << 23) it is the shift on float mantissa length (23 bits). Am I right?

Thank you!

Wild! Meta-learning is certainly a blossoming field.

See Table 6 of
"Neural Optimizer Search with Reinforcement Learning"
https://arxiv.org/abs/1709.07417

Thanks for the heads up.

FYI: "WARNING: cdn.mathjax.org has been retired. Check https://www.mathjax.org/cdn-shutting-down/ for migration tips."

You might want to update your mathjax cdn, or even switch to KaTex for faster performance! ;)

Well, I'd never heard of SIMPOL before!

Thanks for topic. My C sucks (but I am 55 year old lawyer). I used this idea of divide by 2 to make a stab at writing a reasonably accurate and not too slow log function for SIMPOL which doesn't have a built in log function. Once between range of .5 to 1, I worked out some polynomial approximations.
Written is SIMPOL and in a first stab way that I could find my typing and logical errors.


constant log10_2 "0.30102999566398119521373889472449"

function main()
 number n
 number log
 log = 0
 integer e
 e = 0
 string message,title
 anyvalue prompt
 message = "Entry a number."
 title = "log test"
 prompt = ""
 n = .toval(getuserinput(message,title,prompt, error =e),"",10)
 log = jdk_log(n)
end function .tostr(log,10)

function jdk_log(number n)
 number log,log10,log3
 number eval, shift
 integer p
 p = 0
 eval = n 
 shift = 0
 if n < 1/2
 eval = eval * 100
 shift = 2
 end if

 while eval > 1 
 eval = eval/2
 p = p +1
 end while
 number x, x2, x3,x4, x5, x6, x7, x8, x9, x10
 number test10, test3 
 x = eval
 x2 = x * x 
 x3 = x2 * x
 x4 = x3 * x
 x5 = x4 * x
 x6 = x5 * x
 x7 = x6 * x
 x8 = x7 * x
 x9 = x8 * x
 x10 = x9 * x

 test10 = -(1436/1000) 
 test10 = test10 + (5541/1000 * x)
 test10 = test10 - (12204/1000 * x2)
 test10 = test10 + (13987/1000 * x3)
 test10 = test10 + (0304/1000 * x4)
 test10 = test10 - (17075/1000 * x5)
 test10 = test10 + (4459/1000 * x6)
 test10 = test10 + (30204/1000 * x7)
 test10 = test10 - (42185/1000 * x8)
 test10 = test10 + (23255/1000 * x9)
 test10 = test10 - (4849/1000 * x10)

 test3 = -(944/1000) 
 test3 = test3 + (1814/1000 * x)
 test3 = test3 - (1241/1000 * x2)
 test3 = test3 + (371/1000 * x3)


 log10 = p * .toval(log10_2,"",10) + round(test10,1/1000000) - shift
 log3 = p * .toval(log10_2,"",10) + round(test3,1/1000000) - shift 
 log = round((log10 + log3 )/2,1/10000)
end function log

Great! Thanks
Hi Paul,
 For your first point - i.e. multitask regularization, could you give some citations both from RL world and dialog world.
Great question!

The decision service (which is done out of MSR-NY, I'm just a user/contributor) has a problem which plagues many machine learning toolkit products: the customer is either so sophisticated that they want to write their own, or so unsophisticated that they are unable to operate the tool. Concentrating on a specific vertical and providing a simple interface is a way to bridge the gap, hence the DS has been focusing on news recommendations as a vertical scenario for go-to-market.

Internally, we are using the decision service for a variety of scenarios, and the open source version is capable of handling general use cases and is production ready.

Hi Paul,
I´m very interested in RL as a service. I had already read about the service you worked on, the paper is very interesting. 

But what I found at machine learning options in Azure is the Custom option, which seems to be for especific use cases, like news recommendations (taking the articles content in consideration).

The service is being fully used in more general use cases (as cited in the paper)? I mean, is it ready to production or is in a kind of beta?

Congratulations for your work!

It's been a while since I've thought about this, but the bit manipulations are essentially extracting a multiple of a power of two, so that might work better than explicit division and squaring.

The power series for the exponential converges more rapidly for arguments near zero. So divide your argument by some power of 2 to bring it close to zero, do the truncated power series, and then square the result the appropriate number of times. I don't know if this would beat the stock Exp function, but it is worth a look.

Under these conditions, I would still train end-to-end, but with explicit regularization to control overfitting.

Sometimes end-to-end system is not a good idea since we found it tends to overfit the dataset. If we train each sub-system with different data, it tends to get better performance on unknown environment.

Not out yet (afaik). To be fair, it is just a workshop paper. If you review the conference version of the paper, demand a code release!

But where's the code to verify?

The Alexa prize (https://developer.amazon.com/alexaprize) is pretty cool. They acquired mxnet which will remain an open project afaik. Alex Smola's team has a mandate to do research, so given typical latencies I expect to see things out of there hitting the major conferences. Charles Elkan and Ralf Herbrich are active NIPS contributors. Amazon had a paper at NIPS this year (main conference).

So, not as much as other companies with more history, but the direction is encouraging.

What research has Amazon "opened up"?

Yes, it is released under the New BSD License.

Can we use this for open-source project ?

Luc Steels did an interesting related experiment called "Talking Heads" where agents evolved a language on objects and properties by interacting:

- https://www.csl.sony.fr/downloads/papers/2003/steels-03c.pdf

- http://langsci-press.org/catalog/book/49