码迷,mamicode.com
首页 > 编程语言 > 详细

java 爬虫下载酷狗top500

时间:2019-11-16 12:24:37      阅读:178      评论:0      收藏:0      [点我收藏+]

标签:query   code   tostring   rank   www   download   from   nload   其他   

  想下载歌曲,可app,网站啥的都需要会员,作为一个穷屌丝,没钱啊。所以想搞个代码去下载歌曲,

  打开酷狗top500:http://www.kugou.com/yy/rank/home/1-8888.html

技术图片

 

看到只有22个,有看了url猜测分页果然:把url改为:http://www.kugou.com/yy/rank/home/2-8888.html;  就进入了第二页

技术图片

 

 

 

然后点击歌曲播放页面:http://www.kugou.com/song/#hash=3337C18539D5BB00D8027D653D536A35&album_id=32440418

使用谷歌浏览器的控制台的Elements搜索MP3

 

技术图片

 

 然而使用java请求,结果没有找到MP3,于是想到看看是不是其他请求或js加载的,然后查看entwork,找MP3

最终找到:https://wwwapi.kugou.com/yy/index.php?r=play/getdata&callback=jQuery19106214312060791911_1573866413695&hash=3337C18539D5BB00D8027D653D536A35&album_id=32440418&dfid=2AMaYT1qKRQF0US0Ii20bXyy&mid=98350860ab960276d13883f7d07d364d&platid=4&_=1573866413696

技术图片

 

 然后查看参数:

 技术图片

 

 使用postman测试查看必要参数为:

callback=jQuery191027067069941080546_1546235744250

hash=_HASH_

album_id=_ALBUM_ID_

mid=98350860ab960276d13883f7d07d364d

刷新另外一首歌,看到callback和mid没有变,所以只要找到hash和album_id就可以了

查看hash和album_id和播放页面url的相同,于是数据应该是从top500页面传过来的

使用java访问top500的页面发下一段字符串:

[{"Hash":"7707BE115CF9131E3AEF782D294155D4","FileName":"\u963f\u60a0\u60a0 - \u4e00\u751f\u4e0e\u4f60\u64e6\u80a9\u800c\u8fc7","timeLen":239.464,"privilege":10,"size":3832017,"album_id":32922861,"encrypt_id":"11cb4g06"},{"Hash":"3337C18539D5BB00D8027D653D536A35","FileName":"\u963f\u5197 - \u4f60\u7684\u7b54\u6848","timeLen":219,"privilege":10,"size":3519376,"album_id":32440418,"encrypt_id":"113a6795"},{"Hash":"FAEDD01C425118BA343648B5AF35861F","FileName":"en - \u56a3\u5f20","timeLen":254.04,"privilege":10,"size":4065240,"album_id":26433357,"encrypt_id":"ybj0x09"},{"Hash":"33EB8FE0DC9F70D9F7FE4CB77305D5A8","FileName":"\u6d77\u6765\u963f\u6728\u3001\u963f\u5477\u62c9\u53e4\u3001\u66f2\u6bd4\u963f\u4e14 - \u522b\u77e5\u5df1","timeLen":280.111,"privilege":10,"size":4482365,"album_id":16324799,"encrypt_id":"uajki71"},{"Hash":"1952AF6E49AF16B4C130282B8A40EEE7","FileName":"\u845b\u4e1c\u742a - \u60ac\u6eba","timeLen":197.12,"privilege":10,"size":3154481,"album_id":23773597,"encrypt_id":"10fdd762"},{"Hash":"E908C26B69C2873C75CFCEABB5B81F2B","FileName":"\u6768\u5c0f\u58ee - \u6211\u627f\u8ba4\u6211\u81ea\u5351","timeLen":269.348,"privilege":0,"size":4310039,"album_id":30562954,"encrypt_id":"10a05t84"},{"Hash":"ED8DBD8AE97359912A8AEC71C61758D2","FileName":"\u6768\u5c0f\u58ee - \u4e00\u4e2a\u4eba\u633a\u597d","timeLen":270.027,"privilege":10,"size":4320906,"album_id":26827357,"encrypt_id":"ygc0db4"},{"Hash":"B30DF1A53446B717C35629952E9B7156","FileName":"\u6768\u987a\u9ad8\u3001\u51ef\u5c0f\u6674 - \u4e0d\u7231\u6211\u5c31\u522b\u4f24\u5bb3\u6211","timeLen":253.44,"privilege":10,"size":4055502,"album_id":27400705,"encrypt_id":"ypwfo0a"},{"Hash":"247ED2BAA3AE6D3DBA48801F802459F3","FileName":"\u4e8e\u5609\u4e50 - \u9003\u7231","timeLen":242.018,"privilege":10,"size":3875491,"album_id":13882049,"encrypt_id":"sulfsb0"},{"Hash":"B221F5355343714DC6AA7647B62F9B38","FileName":"\u5468\u6770\u4f26 - \u65ad\u4e86\u7684\u5f26","timeLen":297,"privilege":10,"size":4761507,"album_id":973367,"encrypt_id":"70mo63"},{"Hash":"B6880A36D02561912C53831ACF305D0D","FileName":"\u5b9d\u77f3Gem\u3001\u9648\u4f1f\u9706 - \u91ce\u72fcDisco","timeLen":240.039,"privilege":10,"size":3841212,"album_id":30181741,"encrypt_id":"102ilgdd"},{"Hash":"04E4C1D0AFB9DEEF3B0834AA1F71B654","FileName":"\u5d14\u4f1f\u7acb - \u9152\u9189\u7684\u8774\u8776","timeLen":206.184,"privilege":10,"size":3299669,"album_id":26382011,"encrypt_id":"yavyv32"},{"Hash":"BE7C2E688ED155FA6A6B45392FD0FA50","FileName":"Tones And I - Dance Monkey","timeLen":209.815,"privilege":10,"size":3357510,"album_id":26441038,"encrypt_id":"xcw85d8"},{"Hash":"EBBDCAA92236F5877FAD2A91C91564F9","FileName":"\u5f20\u9753\u9896 - \u5e7b\u7eb1\u4e4b\u7075","timeLen":248.032,"privilege":0,"size":3968984,"album_id":33385138,"encrypt_id":"11i24me5"},{"Hash":"DB66B6258041852A7B856729FF3A4893","FileName":"\u738b\u4f73\u6768 - \u9057\u61be","timeLen":255.033,"privilege":10,"size":4081121,"album_id":27347801,"encrypt_id":"yp9nc64"},{"Hash":"517E05B8121A9E7D6E5629B41CCA61C4","FileName":"\u97f3\u9619\u8bd7\u542c\u3001\u8d75\u65b9\u5a67 - \u8292\u79cd","timeLen":216.032,"privilege":10,"size":3456984,"album_id":22135843,"encrypt_id":"x20jh9b"},{"Hash":"D46AC87092F83685353E564B1F5C9BD3","FileName":"\u963f\u60a0\u60a0 - \u5ff5\u65e7","timeLen":229.877,"privilege":10,"size":3678626,"album_id":27580208,"encrypt_id":"ys2nmca"},{"Hash":"5784F2CBA32B65D5E48E4C415F843342","FileName":"\u9ec4\u9704\u96f2 - \u5de6\u624b\u6307\u6708 (\u7eaf\u4eab\u7248)","timeLen":185.086,"privilege":10,"size":2974138,"album_id":12245979,"encrypt_id":"svuie28"},{"Hash":"07A55D0F8FF088BDCA36B836F7FE115F","FileName":"\u674e\u6615\u878d\u3001\u6a0a\u6850\u821f\u3001\u674e\u51ef\u7a20 - \u4f60\u7b11\u8d77\u6765\u771f\u597d\u770b","timeLen":172.068,"privilege":10,"size":2753687,"album_id":19428374,"encrypt_id":"w7pn29f"},{"Hash":"A017B769F3D653AC063AE20639C0020F","FileName":"\u8521\u5065\u96c5 - \u7ea2\u8272\u9ad8\u8ddf\u978b","timeLen":206.68,"privilege":10,"size":3321862,"album_id":978459,"encrypt_id":"39nred"},{"Hash":"6CB8F530E00F6CFEF4A3371FD3F0BEB8","FileName":"\u5c0f\u963f\u67ab - \u6700\u8fdc\u7684\u4f60\u662f\u6211\u6700\u8fd1\u7684\u7231","timeLen":136.15,"privilege":10,"size":2178865,"album_id":25406316,"encrypt_id":"xzj9add"},{"Hash":"BBDCDDCBD58FF85B60E728726A4B3C9A","FileName":"\u9708\u4e39(\u6d6a\u54e5) - \u9003\u7231","timeLen":242.233,"privilege":0,"size":3876263,"album_id":32104019,"encrypt_id":"10vik0d0"}]

里边存有hash和album_id,然后就是字符串获取了转换了 ,当然也发现了一个好东西:歌曲名称:FileName

然后就是代码了:

package com.kg;

import net.sf.json.JSONArray;
import net.sf.json.JSONObject;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
import org.jsoup.select.Elements;

import java.io.IOException;
import java.util.ArrayList;
import java.util.Date;
import java.util.List;
import java.util.regex.Matcher;
import java.util.regex.Pattern;

/**
 * @ClassName: SpiderKugou.java
 * @Description: TODO
 * @author 小猪聪聪
 * @version V1.0
 * @Date 2019年11月15日 下午5:00:33
 */
public class SpiderKugou {
    public static String filePath = "D:/music/";
    public static String mp3 = "https://wwwapi.kugou.com/yy/index.php?r=play/getdata&callback=jQuery191027067069941080546_1546235744250&"
            + "hash=_HASH_&album_id=_ALBUM_ID_&mid=98350860ab960276d13883f7d07d364d";

    public static String LINK = "http://www.kugou.com/song/#hash=_HASH_&album_id=_ALBUM_ID_";
    // "https://www.kugou.com/yy/rank/home/PAGE-23784.html?from=rank";

    public static void main(String[] args) throws IOException {
        int index = 4;
        d("http://www.kugou.com/yy/rank/home/" + index + "-8888.html");
    }
    public static String d(String url) throws IOException {
        HttpGetConnect connect = new HttpGetConnect();
        String content = connect.connect(url, "utf-8");
        String hashStr = content.substring(content.indexOf("[{"), content.indexOf("}]") + 2);
        JSONArray hashArr = JSONArray.fromObject(hashStr);
        System.out.println(hashArr);
        for (int i = 0; i < hashArr.size(); i++) {
            JSONObject item = JSONObject.fromObject(hashArr.get(i));
            String hash = item.get("Hash").toString();
            String albumId = item.get("album_id").toString();
            String fileName = item.get("FileName").toString();
            String itemUrl = mp3.replace("_HASH_", hash).replace("_ALBUM_ID_", albumId);
            download(itemUrl, fileName,i);
        }

        return null;
    }

    public static String download(String url, String name,int i) throws IOException {
        HttpGetConnect connect = new HttpGetConnect();
        String mp = connect.connect(url, "utf-8");
        mp = mp.substring(mp.indexOf("(") + 1, mp.length() - 2);
        JSONObject json = JSONObject.fromObject(mp);
        String playUrl = json.getJSONObject("data").getString("play_url");
        String song_name = json.getJSONObject("data").getString("song_name");
        FileDownload down = new FileDownload();
        down.download(playUrl, filePath + song_name + ".mp3");
        System.out.println(name + " ---- >>> 下载完成");
        return playUrl;
    }

}

 

package com.kg;

import org.apache.commons.logging.Log;
import org.apache.commons.logging.LogFactory;
import org.apache.http.HttpEntity;
import org.apache.http.client.config.RequestConfig;
import org.apache.http.client.methods.CloseableHttpResponse;
import org.apache.http.client.methods.HttpGet;
import org.apache.http.impl.client.CloseableHttpClient;
import org.apache.http.impl.client.HttpClients;
import org.apache.http.impl.conn.BasicHttpClientConnectionManager;

import java.io.BufferedReader;
import java.io.IOException;
import java.io.InputStream;
import java.io.InputStreamReader;

/**
 * @ClassName: HttpGetConnect.java
 * @Description: TODO
 * @author 小猪聪聪
 * @version V1.0
 * @Date 2019年11月15日 下午5:01:28
 */
public class HttpGetConnect {
    /**
     * 获取html内容
     * 
     * @param url
     * @param charsetName UTF-8、GB2312
     * @return
     * @throws IOException
     */
    public static String connect(String url, String charsetName) throws IOException {
        BasicHttpClientConnectionManager connManager = new BasicHttpClientConnectionManager();

        CloseableHttpClient httpclient = HttpClients.custom().setConnectionManager(connManager).build();
        String content = "";

        try {
            HttpGet httpget = new HttpGet(url);

            RequestConfig requestConfig = RequestConfig.custom().setSocketTimeout(5000).setConnectTimeout(50000)
                    .setConnectionRequestTimeout(50000).build();
            httpget.setConfig(requestConfig);
            httpget.setHeader("Accept", "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8");
            httpget.setHeader("Accept-Encoding", "gzip,deflate,sdch");
            httpget.setHeader("Accept-Language", "zh-CN,zh;q=0.8");
            httpget.setHeader("Connection", "keep-alive");
            httpget.setHeader("Upgrade-Insecure-Requests", "1");
            httpget.setHeader("User-Agent",
                    "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/45.0.2454.101 Safari/537.36");
            // httpget.setHeader("Hosts", "www.oschina.net");
            httpget.setHeader("cache-control", "max-age=0");

            CloseableHttpResponse response = httpclient.execute(httpget);

            int status = response.getStatusLine().getStatusCode();
            if (status >= 200 && status < 300) {

                HttpEntity entity = response.getEntity();
                InputStream instream = entity.getContent();
                BufferedReader br = new BufferedReader(new InputStreamReader(instream, charsetName));
                StringBuffer sbf = new StringBuffer();
                String line = null;
                while ((line = br.readLine()) != null) {
                    sbf.append(line + "");
                }
                br.close();
                content = sbf.toString();
            } else {
                content = "";
            }

        } catch (Exception e) {
            e.printStackTrace();
        } finally {
            httpclient.close();
        }
        // log.info("content is " + content);
        return content;
    }

    private static Log log = LogFactory.getLog(HttpGetConnect.class);
}

 

 

package com.kg;

import java.io.*;

import org.apache.commons.logging.Log;
import org.apache.commons.logging.LogFactory;
import org.apache.http.client.config.RequestConfig;
import org.apache.http.client.methods.CloseableHttpResponse;
import org.apache.http.client.methods.HttpGet;
import org.apache.http.impl.client.CloseableHttpClient;
import org.apache.http.impl.client.HttpClients;

/**
 * @ClassName: FileDownload.java
 * @Description: TODO
 * @author 小猪聪聪
 * @version V1.0
 * @Date 2019年11月15日 下午5:02:22
 */
public class FileDownload {
    /**
     * 文件下载
     * 
     * @param url  链接地址
     * @param path 要保存的路径及文件名
     * @return
     */
    public static boolean download(String url, String path) throws IOException {
        new WriteText().writeToText(url);
        new WriteText().writeToText(path);

        boolean flag = false;

        CloseableHttpClient httpclient = HttpClients.createDefault();
        RequestConfig requestConfig = RequestConfig.custom().setSocketTimeout(2000).setConnectTimeout(2000).build();
        HttpGet get = new HttpGet(url);
        get.setConfig(requestConfig);

        BufferedInputStream in = null;
        BufferedOutputStream out = null;
        try {
            for (int i = 0; i < 3; i++) {
                CloseableHttpResponse result = httpclient.execute(get);
                System.out.println(result.getStatusLine());
                if (result.getStatusLine().getStatusCode() == 200) {
                    in = new BufferedInputStream(result.getEntity().getContent());
                    File file = new File(path);
                    out = new BufferedOutputStream(new FileOutputStream(file));
                    byte[] buffer = new byte[1024];
                    int len = -1;
                    while ((len = in.read(buffer, 0, 1024)) > -1) {
                        out.write(buffer, 0, len);
                    }
                    flag = true;
                    break;
                } else if (result.getStatusLine().getStatusCode() == 500) {
                    continue;
                }
            }

        } catch (Exception e) {
            e.printStackTrace();
            flag = false;
        } finally {
            get.releaseConnection();
            try {
                if (in != null) {
                    in.close();
                }
                if (out != null) {
                    out.close();
                }
            } catch (Exception e) {
                e.printStackTrace();
                flag = false;
            }
        }
        return flag;
    }

    private static Log log = LogFactory.getLog(FileDownload.class);
}

 

package com.kg;

import java.io.*;

/**
 * @ClassName: aa.java
 * @Description: TODO
 * @author 小猪聪聪
 * @version V1.0
 * @Date 2019年11月15日 下午5:02:46
 */
public class WriteText {
    public void writeToText(String musicInfo) throws IOException {

        String path = "D:\\music\\music_info\\music_info.txt";
        File file = new File(path);
        if (!file.exists()) {
            file.getParentFile().mkdirs();
        }
        file.createNewFile();

        // write
        FileWriter fw = new FileWriter(file, true);
        BufferedWriter bw = new BufferedWriter(fw);
//写入到txt并自动换行
        bw.write(musicInfo + "\r\n");
        bw.flush();
        bw.close();
        fw.close();

        // read
        FileReader fr = new FileReader(file);
        BufferedReader br = new BufferedReader(fr);
        String str = br.readLine();
    }

}

 

package com.kg;

import java.io.IOException;

/**
 * @ClassName: Test.java
 * @Description: TODO
 * @author 小猪聪聪
 * @version V1.0
 * @Date 2019年11月15日 下午5:03:13
 */
public class Test {
    public static void main(String[] args) throws IOException {
        System.out.println("aa");
        new WriteText().writeToText("12376\r\n");
    }
}

 

 

问题:

下载中发现:只能下载73首歌曲,然后慢慢想问题,看到 https://wwwapi.kugou.com/yy/index.php?r=play/getdata&callback=jQuery191027067069941080546_1546235744250&hash=7A5C31C00EB66499DE0F72BB67097111&album_id=19615336&mid=98350860ab960276d13883f7d07d364d

的返回值有问题,然后找问题,发现没有play_url返回了,于是用postman测试一样,继续找,找不到,下载不了了,问题。。

此时无意中刷新了一个页面,发现了一个东西,验证,没错,需要验证,还是图片验证,此时想到,可能服务器会检测同一台机器的访问次数,到一定次数了就需要验证,

于是把代码改为:

    int index = 3;
        d("http://www.kugou.com/yy/rank/home/" + index + "-8888.html");

 

 遇到的问题:

一次只加载一页,当遇到异常时,在刷新下页面,继续。。。
如果想一次全部加载:
方式1:
  写代码验证,但他是图片验证,不好写,暂时放下,
方式2:
  此时想到服务器是用那种方式计数的,ip,mid,或者其他的。。。。然后这个mid干嘛用的

在发现:下载的歌曲有些与名称不对,于是想到,歌曲名称获取不对,看到获取下载路径中的数据有歌曲名称,使用这个试试:

欢迎各位评论,留言

 

java 爬虫下载酷狗top500

标签:query   code   tostring   rank   www   download   from   nload   其他   

原文地址:https://www.cnblogs.com/2d7b44b8d/p/11871127.html

(0)
(0)
   
举报
评论 一句话评论(0
登录后才能评论!
© 2014 mamicode.com 版权所有  联系我们:gaon5@hotmail.com
迷上了代码!