标签:html war methods ber algo bmc mod bit developer
MSLR-WEB10k and MSLR-WEB30k You’ll need much patience to download it, since Microsoft’s server seeds with the speed of 1 Mbit or even slower.
The only difference between these two datasets is the number of queries (10000 and 30000 respectively). They contain 136 columns, mostly filled with different term frequencies and so on. (but the text of query and document are available)
Apart from these datasets, LETOR3.0 and LETOR 4.0 are available, which were published in 2008 and 2009. Those datasets are smaller. From LETOR4.0 MQ-2007 and MQ-2008 are interesting (46 features there). MQ stays for million queries.
There are plenty of algorithms on wiki and their modifications created specially for LETOR (with papers).
There are many algorithms developed, but checking most of them is real problem, because there is no available implementation one can try. But constantly new algorithms appear and their developers claim that new algorithm provides best results on all (or almost all) datasets.
This of course hardly believable, specially provided that most researchers don’t publish code of their algorithms. In theory, one shall publish not only the code of algorithms, but the whole code of experiment.
However, there are some algorithms that are available (apart from regression, of course).
Learning to rank (software, datasets)
标签:html war methods ber algo bmc mod bit developer
原文地址:http://www.cnblogs.com/energy1010/p/7261851.html