tag:blogger.com,1999:blog-4446292666398344382.comments2017-10-19T08:27:15.129-07:00Machined LearningsPaul Mineirohttp://www.blogger.com/profile/05439062526157173163noreply@blogger.comBlogger141125tag:blogger.com,1999:blog-4446292666398344382.post-63187665703197064322017-09-14T08:37:41.438-07:002017-09-14T08:37:41.438-07:00Well, I'd never heard of SIMPOL before!Well, I'd never heard of SIMPOL before!Paul Mineirohttps://www.blogger.com/profile/05439062526157173163noreply@blogger.comtag:blogger.com,1999:blog-4446292666398344382.post-21140421015514716582017-09-14T07:56:03.676-07:002017-09-14T07:56:03.676-07:00Thanks for topic. My C sucks (but I am 55 year old...Thanks for topic. My C sucks (but I am 55 year old lawyer). I used this idea of divide by 2 to make a stab at writing a reasonably accurate and not too slow log function for SIMPOL which doesn't have a built in log function. Once between range of .5 to 1, I worked out some polynomial approximations.<br />Written is SIMPOL and in a first stab way that I could find my typing and logical errors.<br /><br /><br />constant log10_2 "0.30102999566398119521373889472449"<br /><br />function main()<br /> number n<br /> number log<br /> log = 0<br /> integer e<br /> e = 0<br /> string message,title<br /> anyvalue prompt<br /> message = "Entry a number."<br /> title = "log test"<br /> prompt = ""<br /> n = .toval(getuserinput(message,title,prompt, error =e),"",10)<br /> log = jdk_log(n)<br />end function .tostr(log,10)<br /><br />function jdk_log(number n)<br /> number log,log10,log3<br /> number eval, shift<br /> integer p<br /> p = 0<br /> eval = n <br /> shift = 0<br /> if n < 1/2<br /> eval = eval * 100<br /> shift = 2<br /> end if<br /><br /> while eval > 1 <br /> eval = eval/2<br /> p = p +1<br /> end while<br /> number x, x2, x3,x4, x5, x6, x7, x8, x9, x10<br /> number test10, test3 <br /> x = eval<br /> x2 = x * x <br /> x3 = x2 * x<br /> x4 = x3 * x<br /> x5 = x4 * x<br /> x6 = x5 * x<br /> x7 = x6 * x<br /> x8 = x7 * x<br /> x9 = x8 * x<br /> x10 = x9 * x<br /><br /> test10 = -(1436/1000) <br /> test10 = test10 + (5541/1000 * x)<br /> test10 = test10 - (12204/1000 * x2)<br /> test10 = test10 + (13987/1000 * x3)<br /> test10 = test10 + (0304/1000 * x4)<br /> test10 = test10 - (17075/1000 * x5)<br /> test10 = test10 + (4459/1000 * x6)<br /> test10 = test10 + (30204/1000 * x7)<br /> test10 = test10 - (42185/1000 * x8)<br /> test10 = test10 + (23255/1000 * x9)<br /> test10 = test10 - (4849/1000 * x10)<br /><br /> test3 = -(944/1000) <br /> test3 = test3 + (1814/1000 * x)<br /> test3 = test3 - (1241/1000 * x2)<br /> test3 = test3 + (371/1000 * x3)<br /><br /><br /> log10 = p * .toval(log10_2,"",10) + round(test10,1/1000000) - shift<br /> log3 = p * .toval(log10_2,"",10) + round(test3,1/1000000) - shift <br /> log = round((log10 + log3 )/2,1/10000)<br />end function log<br /><br /><br /><br />jdkhttps://www.blogger.com/profile/17987574304860090197noreply@blogger.comtag:blogger.com,1999:blog-4446292666398344382.post-91818969077766173592017-09-08T14:48:46.531-07:002017-09-08T14:48:46.531-07:00Great! ThanksGreat! ThanksRavihttps://www.blogger.com/profile/03453457907385341473noreply@blogger.comtag:blogger.com,1999:blog-4446292666398344382.post-31660378319331792752017-09-08T14:01:57.445-07:002017-09-08T14:01:57.445-07:00Consider slide 84 of http://www.thespermwhale.com/...Consider slide 84 of http://www.thespermwhale.com/jaseweston/icml2016/icml2016-memnn-tutorial.pdf ... the primary task is story comprehension but adding the additional task of predicting the (exact) teacher response helps.<br /><br />I also just noticed slide 27 ... but I'm not familiar with that part.Paul Mineirohttps://www.blogger.com/profile/05439062526157173163noreply@blogger.comtag:blogger.com,1999:blog-4446292666398344382.post-46636133846070234032017-09-08T10:47:37.755-07:002017-09-08T10:47:37.755-07:00Hi Paul,
For your first point - i.e. multitask re...Hi Paul,<br /> For your first point - i.e. multitask regularization, could you give some citations both from RL world and dialog world.<br />Ravihttps://www.blogger.com/profile/03453457907385341473noreply@blogger.comtag:blogger.com,1999:blog-4446292666398344382.post-19525310934857485322017-08-30T12:50:53.190-07:002017-08-30T12:50:53.190-07:00Great question!
The decision service (which is do...Great question!<br /><br />The decision service (which is done out of MSR-NY, I'm just a user/contributor) has a problem which plagues many machine learning toolkit products: the customer is either so sophisticated that they want to write their own, or so unsophisticated that they are unable to operate the tool. Concentrating on a specific vertical and providing a simple interface is a way to bridge the gap, hence the DS has been focusing on news recommendations as a vertical scenario for go-to-market.<br /><br />Internally, we are using the decision service for a variety of scenarios, and the open source version is capable of handling general use cases and is production ready.Paul Mineirohttps://www.blogger.com/profile/05439062526157173163noreply@blogger.comtag:blogger.com,1999:blog-4446292666398344382.post-87092216821554309902017-08-30T09:18:14.414-07:002017-08-30T09:18:14.414-07:00Hi Paul,
I´m very interested in RL as a service. ...Hi Paul,<br /><br />I´m very interested in RL as a service. I had already read about the service you worked on, the paper is very interesting. <br /><br />But what I found at machine learning options in Azure is the Custom option, which seems to be for especific use cases, like news recommendations (taking the articles content in consideration).<br /><br />The service is being fully used in more general use cases (as cited in the paper)? I mean, is it ready to production or is in a kind of beta?<br /><br />Congratulations for your work!Rubens Santoshttps://www.blogger.com/profile/03307012403677741970noreply@blogger.comtag:blogger.com,1999:blog-4446292666398344382.post-65106021894661893572017-07-10T11:40:07.118-07:002017-07-10T11:40:07.118-07:00It's been a while since I've thought about...It's been a while since I've thought about this, but the bit manipulations are essentially extracting a multiple of a power of two, so that might work better than explicit division and squaring.Paul Mineirohttps://www.blogger.com/profile/05439062526157173163noreply@blogger.comtag:blogger.com,1999:blog-4446292666398344382.post-23949771676078276722017-07-08T09:39:41.309-07:002017-07-08T09:39:41.309-07:00The power series for the exponential converges mor...The power series for the exponential converges more rapidly for arguments near zero. So divide your argument by some power of 2 to bring it close to zero, do the truncated power series, and then square the result the appropriate number of times. I don't know if this would beat the stock Exp function, but it is worth a look. richard nineteenfortyonehttps://www.blogger.com/profile/17031413084599382303noreply@blogger.comtag:blogger.com,1999:blog-4446292666398344382.post-82303018017217566542017-02-16T08:05:08.157-08:002017-02-16T08:05:08.157-08:00Under these conditions, I would still train end-to...Under these conditions, I would still train end-to-end, but with explicit regularization to control overfitting.Paul Mineirohttps://www.blogger.com/profile/05439062526157173163noreply@blogger.comtag:blogger.com,1999:blog-4446292666398344382.post-15375821005192932052017-02-15T18:51:01.820-08:002017-02-15T18:51:01.820-08:00Sometimes end-to-end system is not a good idea sin...Sometimes end-to-end system is not a good idea since we found it tends to overfit the dataset. If we train each sub-system with different data, it tends to get better performance on unknown environment.cnxhttps://www.blogger.com/profile/12614847138980399016noreply@blogger.comtag:blogger.com,1999:blog-4446292666398344382.post-56939114981009547092017-01-18T08:37:24.595-08:002017-01-18T08:37:24.595-08:00Not out yet (afaik). To be fair, it is just a wor...Not out yet (afaik). To be fair, it is just a workshop paper. If you review the conference version of the paper, demand a code release!Paul Mineirohttps://www.blogger.com/profile/05439062526157173163noreply@blogger.comtag:blogger.com,1999:blog-4446292666398344382.post-69580984750384199532017-01-18T05:10:01.305-08:002017-01-18T05:10:01.305-08:00But where's the code to verify?But where's the code to verify?Carlos Perezhttps://www.blogger.com/profile/01488838149594154679noreply@blogger.comtag:blogger.com,1999:blog-4446292666398344382.post-45123320729363656902016-12-12T22:51:58.957-08:002016-12-12T22:51:58.957-08:00The Alexa prize (https://developer.amazon.com/alex...The Alexa prize (https://developer.amazon.com/alexaprize) is pretty cool. They acquired mxnet which will remain an open project afaik. Alex Smola's team has a mandate to do research, so given typical latencies I expect to see things out of there hitting the major conferences. Charles Elkan and Ralf Herbrich are active NIPS contributors. Amazon had a paper at NIPS this year (main conference).<br /><br />So, not as much as other companies with more history, but the direction is encouraging.Paul Mineirohttps://www.blogger.com/profile/05439062526157173163noreply@blogger.comtag:blogger.com,1999:blog-4446292666398344382.post-442121453712609562016-12-12T22:18:33.120-08:002016-12-12T22:18:33.120-08:00This comment has been removed by the author.Paul Mineirohttps://www.blogger.com/profile/05439062526157173163noreply@blogger.comtag:blogger.com,1999:blog-4446292666398344382.post-63190319200439789572016-12-12T17:41:05.882-08:002016-12-12T17:41:05.882-08:00What research has Amazon "opened up"?What research has Amazon "opened up"?Zeeshan Ziahttps://www.blogger.com/profile/17867461482990681857noreply@blogger.comtag:blogger.com,1999:blog-4446292666398344382.post-50687384551473814042016-07-20T16:33:45.394-07:002016-07-20T16:33:45.394-07:00Yes, it is released under the New BSD License.Yes, it is released under the New BSD License.Paul Mineirohttps://www.blogger.com/profile/05439062526157173163noreply@blogger.comtag:blogger.com,1999:blog-4446292666398344382.post-17278654840405285882016-07-19T22:11:34.777-07:002016-07-19T22:11:34.777-07:00Can we use this for open-source project ?Can we use this for open-source project ?Ricky Chanhttps://www.blogger.com/profile/06011446406036414711noreply@blogger.comtag:blogger.com,1999:blog-4446292666398344382.post-21729443463826250292016-07-09T18:51:07.440-07:002016-07-09T18:51:07.440-07:00Luc Steels did an interesting related experiment c...Luc Steels did an interesting related experiment called "Talking Heads" where agents evolved a language on objects and properties by interacting:<br /><br />- https://www.csl.sony.fr/downloads/papers/2003/steels-03c.pdf<br /><br />- http://langsci-press.org/catalog/book/49UnknownPihttps://www.blogger.com/profile/02430860120787364856noreply@blogger.comtag:blogger.com,1999:blog-4446292666398344382.post-85805548407528884792016-01-31T15:46:39.220-08:002016-01-31T15:46:39.220-08:00Doh! My bad.Doh! My bad.Paul Mineirohttps://www.blogger.com/profile/05439062526157173163noreply@blogger.comtag:blogger.com,1999:blog-4446292666398344382.post-17391309805993079322016-01-31T15:06:09.745-08:002016-01-31T15:06:09.745-08:00Don't count McCartney out yet!Don't count McCartney out yet!Allen Knutsonhttps://www.blogger.com/profile/15616422252030334511noreply@blogger.comtag:blogger.com,1999:blog-4446292666398344382.post-464438244077232882016-01-08T10:06:02.597-08:002016-01-08T10:06:02.597-08:00We are aware of an encouraging result for the case...We are aware of an encouraging result for the case of static attention where the "parts" are features and there is no competition among them (i.e. the attention vector z does not have to sum to 1). This is the same as learning a sparse model and Andrew Ng's analysis of L1 regularization ( http://ai.stanford.edu/~ang/papers/icml04-l1l2.pdf ) shows that it can exponentially reduce sample complexity (from O(number of features) to O(log(number of features)). At the same time, rotationally invariant methods (c.f. the paper above) have to use O(number of features). When I read the paper, long time ago, I did not find the analysis very enlightening, but perhaps the ideas in there are right headed.Nikoshttps://www.blogger.com/profile/11112961058824811801noreply@blogger.comtag:blogger.com,1999:blog-4446292666398344382.post-21076556296746487712015-11-26T14:37:22.509-08:002015-11-26T14:37:22.509-08:00This is also a pretty fast approximation for float...This is also a pretty fast approximation for floats (also is reasonable for double, change log_t to double then), based on a power serie expansion of the arctan identity of ln. The compiler can vectorize it, which speeds it up considerably.<br /><br /><br /><br />#include <br />/* Needed for M_LN2 */<br />#define __USE_MISC 1<br />#include <br />#include <br /><br />/* Predefined constants */<br />#define i3 0.33333333333333333<br />#define i5 0.2<br />#define i7 0.14285714285714285<br />#define i9 0.11111111111111111<br />#define i11 0.09090909090909091<br />#define i13 0.07692307692307693<br />#define i15 0.06666666666666667<br /><br /><br />/* alias the type to use the core to make more precise approximations */<br />typedef float log_t;<br />/**<br /> This is a logarithmic approximation function. It is done by splitting up the mantissa and the<br /> exponent. And then use the identity ln z = 2 * arctan (z - 1)/(z + 1)<br /> **/<br /><br />/** Logarithm for small fractions<br /> It calculates ln x, where x <= 1.0<br />**/<br />static inline log_t log1ds(log_t z){<br /> log_t pz = (z-1)/(z+1);<br /> log_t pz2 = pz*pz;<br /> log_t pz3 = pz2*pz;<br /> log_t pz5 = pz3*pz2;<br /> log_t pz7 = pz5*pz2;<br /> log_t pz9 = pz7*pz2;<br /> log_t pz11 = pz9*pz2;<br /> log_t pz13 = pz11*pz2;<br /> log_t pz15 = pz13*pz2;<br /> log_t y = 2 * ( pz + pz3 * i3 + pz5 * i5 + pz7 * i7 + pz9 *i9 + pz11 * i11 + pz13 * i13 + pz15 * i15);<br /> return y;<br />}<br /><br />#define CALLS 1<br />// 0000<br />#define TILL 1000000000<br />/** The natural logarithm **/<br />log_t logd(log_t x){<br /> int exp;<br /> log_t fraction = frexpf((log_t)x, &exp);<br /> log_t res = log1ds(fraction);<br /> return res + exp * M_LN2;<br />}<br />Goddard LeRoyhttps://www.blogger.com/profile/02898381097003622186noreply@blogger.comtag:blogger.com,1999:blog-4446292666398344382.post-1589142078264548032015-11-06T07:51:32.935-08:002015-11-06T07:51:32.935-08:00around 1.0 the approximation of fastlog and faster...around 1.0 the approximation of fastlog and fasterlog are poor (relative error > 13 for fastlog and much worse for fasterlog)Soeren Sonnenburghttps://www.blogger.com/profile/00844611789125605702noreply@blogger.comtag:blogger.com,1999:blog-4446292666398344382.post-10878158591112471102015-10-14T04:51:11.157-07:002015-10-14T04:51:11.157-07:00As for automated training, chalearn is running a c...As for automated training, chalearn is running a challenge on that: http://automl.chalearn.org/<br /><br />Also I liked a knowledge competition on Kaggle where the task was to figure out the rules of poker (category rank of 5-card hand). Many incorporated domain knowledge, like that the order of the cards dealt does not matter, so it should be an entirely new game.Anonhttps://www.blogger.com/profile/01360755535659830635noreply@blogger.com