tag:blogger.com,1999:blog-5904692812272599252024-02-20T03:52:25.961-06:00Poof (A Netflix Prize Team)I'm "competing" in the netflix prize (www.netflixprize.com) as a senior research project to find out how recommender systems work.ajoberstarhttp://www.blogger.com/profile/08215017593516527934noreply@blogger.comBlogger12125tag:blogger.com,1999:blog-590469281227259925.post-28802251551386694732010-03-14T18:56:00.001-05:002010-03-14T18:57:45.903-05:002nd Prize CancelledNetflix has officially nixed Netflix Prize 2 over privacy concerns. So I guess that's it for this blog.http://blog.netflix.com/2010/03/this-is-neil-hunt-chief-product-officer.html ajoberstarhttp://www.blogger.com/profile/08215017593516527934noreply@blogger.com1tag:blogger.com,1999:blog-590469281227259925.post-57397200456629527422009-11-06T17:41:00.001-06:002009-11-06T17:44:03.277-06:00Netflix Prize Part 2So...Netflix is coming back with another supposedly harder and timed challenge. Stay tuned...http://www.netflixprize.com//community/viewtopic.php?id=1520 ajoberstarhttp://www.blogger.com/profile/08215017593516527934noreply@blogger.com0tag:blogger.com,1999:blog-590469281227259925.post-74957584083804050052009-07-19T14:32:00.002-05:002009-07-19T14:37:42.365-05:00Prize Over?Haven't posted in a while, but last call for the Netflix Prize has been out for nearly a month. There may finally be a winner. I'm wondering if the site will be shut down after that or if it will keep going. They would of course have to come up with some new reasoning for the project, since they've already got a 10% improvement over their algorithm.ajoberstarhttp://www.blogger.com/profile/08215017593516527934noreply@blogger.com0tag:blogger.com,1999:blog-590469281227259925.post-83794783454401831092009-05-03T22:59:00.004-05:002009-05-03T23:13:33.318-05:00Project OverI finished up my project last Monday with my senior seminar presentation.I wasn't able to investigate everything I had planned on, mostly because it took longer than I expected. I may try and work on it after I get out of school if I have time.I did try a couple modifications to BRISMF. First was using a user based neighborhood correction instead of item based. This gave a little improvement, ajoberstarhttp://www.blogger.com/profile/08215017593516527934noreply@blogger.com0tag:blogger.com,1999:blog-590469281227259925.post-26447220506573886982009-04-07T19:09:00.002-05:002009-04-07T19:14:18.110-05:00NSVD1 FailureI went through my NSVD1 code many times, rewriting it completely four or five times, modifying it countless times, but never beating that 1.013. I even redid the math in that paper to make sure there weren't typos, and I don't have time to try and figure that out anymore. With how far through the semester I am now, I don't have time to try any other algorithms, which is very dissapointing but ajoberstarhttp://www.blogger.com/profile/08215017593516527934noreply@blogger.com0tag:blogger.com,1999:blog-590469281227259925.post-48967632470743090852009-04-03T00:06:00.004-05:002009-04-03T00:09:38.198-05:00NSVD1 Grr...So I tried to work on Gravity's training algorithm for Patarek's NSVD1 recommender. And utterly failed. I'm eathier reading it wrong, typing wrong, or not understanding the notation, but it's supposed to give me a RMSE of 0.9344, but mine has gone between. 1.013 and 2 something. Yeah...So I'm going to be redoing that one. Hopefully after getting that working, I can do the Hybrid and then ajoberstarhttp://www.blogger.com/profile/08215017593516527934noreply@blogger.com0tag:blogger.com,1999:blog-590469281227259925.post-12783911748031413332009-03-24T19:06:00.002-05:002009-03-24T19:47:37.233-05:00Progress Without PerfectionInitially, I thought that given the nice detail in the Gravity team's papers, I would be able to implement these and get very close scores. It has turned out not to be the case. As the models get bigger and more complex my results have moved farther away from theirs: not terribly far, but far enough. This could be a factor of details they didn't include in the papers, details I missed in the ajoberstarhttp://www.blogger.com/profile/08215017593516527934noreply@blogger.com0tag:blogger.com,1999:blog-590469281227259925.post-85629558915666155802009-03-17T16:46:00.006-05:002009-03-17T17:03:59.956-05:00Don't Rush Your MathI've been trying to implement Gravity's BRISMF#250 model but have been getting much higher RMSEs. I had no idea what was wrong so I decided to move on and start implementing the code to retrain features after the initial training. To test I had to rerun my BRISMF#1 model which resulted in an RMSE 0.01 higher, which made no sense. I looked through my code for changes I've made since I initiallyajoberstarhttp://www.blogger.com/profile/08215017593516527934noreply@blogger.com0tag:blogger.com,1999:blog-590469281227259925.post-20635842452087829392009-03-13T22:06:00.002-05:002009-03-13T22:11:21.144-05:00Sucess (Relatively)So now that the data is sorted by date as the Gravity team describes in their paper everything worked according to plan. It ran in 13 epochs instead of 50-ish and got a .9217 RMSE instead of 1.0014.Now I can finally start playing around with the other things they've tried. Maybe at some point I can try something of my own.ajoberstarhttp://www.blogger.com/profile/08215017593516527934noreply@blogger.com0tag:blogger.com,1999:blog-590469281227259925.post-1878443053092101932009-03-13T10:41:00.005-05:002009-03-17T17:09:36.657-05:00Matrix FactorizationThe idea behind matrix factorization (which is similar if not the same as what the Netflix community is calling SVD, singular vector decomposition or something) is that you can estimate an I x J matrix R (the ratings matrix) by multiplying two smaller matrices: an I x K and a K x J. Each of those K rows/columns is known as a feature and the matrix factorization (MF) algorithm will estimate the ajoberstarhttp://www.blogger.com/profile/08215017593516527934noreply@blogger.com0tag:blogger.com,1999:blog-590469281227259925.post-67561526540272882542009-03-13T10:27:00.005-05:002009-03-13T11:32:12.687-05:00Things I've Learned About JavaDoing a project with this scale of data (100 million ratings, by 480189 users, on 17770 movies) shows a person just how much they don't know about programming yet.Here's a few things I've learned so far:1. Objects take up a crap ton of memory. You don't realize this when you're just making you're little Person objects in class, but when you try and fit 100 million ratings in memory, objects are ajoberstarhttp://www.blogger.com/profile/08215017593516527934noreply@blogger.com0tag:blogger.com,1999:blog-590469281227259925.post-29210009738182330022009-03-13T10:12:00.003-05:002009-03-13T10:26:58.257-05:00My First AttemptsI'll preface this by saying I have no hopes, expectations, or anything else of actually competing in this prize. I came into this hoping to learn about recommender systems.So I started by finding some research papers on recommender systems and collaborative filtering. I read that and assumed I could start implementing a nearest neighbor algorithm. So I pulled out my trusty old Java and wrote aajoberstarhttp://www.blogger.com/profile/08215017593516527934noreply@blogger.com0