博客迁移：Shell脚本批量文件处理

时间：2017-08-17 14:38:04 阅读：387 评论：0 收藏：0 [点我收藏+]

标签：读取 linu res html 前缀 tom margin order oct

最近我对全部文章链接（permalink）进行了又一次设计。为了使得旧链接仍可訪问。须要把全部旧的URL重定向到新的URL。由于本博客由Github Pages提供服务，HTTPserver和域名均不可配置，仅仅能通过旧的HTML重定向到新的HTML。

于是我须要为全部文章创建一个HTML文件用于重定向。

HTML提供了一种301重定向的方式：

<meta http-equiv="refresh" content="0; url=xxx">
<link rel="canonical" href="xxx" />

第一行是指示浏览器马上重定向。到URL：xxx。content指定了重定向之前显示当前页面的秒数。
第二行是给主流的搜索引擎看的，详情请见： https://www.mattcutts.com/blog/canonical-link-tag/

比如文章2015-05-02-tex-note.md。其文件开头指定了分类信息：

---
layout: blog
categories: linux
...

这篇文章新的URL是/2015/05/02/tex-note.html，旧的URL是/linux/tex-note.html。我须要为它生成一个文件：/linux/tex-note.html来匹配旧的URL。其内容为：

<html>
<head lang="en">
  <meta http-equiv="refresh" content="0; url=/2015/05/02/tex-note.html">
  <link rel="canonical" href="/2015/05/02/tex-note.html" />
</head>
</html>

这样当用户訪问/linux/tex-note.html时，便能够重定向到/2015/05/02/tex-note了。

Bash提供了丰富的文件和字符串Builtin，同一时候借助Linux下的sed，awk等字符串处理工具，进行批量的文件处理实在不能再方便了。

比如，本站的全部缩略图是用Makefile批量更新的，參见：Makefile 批量更新缩略图。

如今来一个脚本继续处理上述的问题吧。

for file in ./_posts/*    
do
    # 遍历./_posts下的全部博客文章
    # 得到 file == ./_posts/2014-10-09-kiss.md

    fname=${file##.*/}
    # 删除最长前缀
    # 得到 fname == 2015-05-02-tex-note.md

    basename=${fname%%\.md}
    # 删除最长后缀
    # 得到 basename == 2015-05-02-tex-note

    urlname=${basename:11}
    # 从下标11開始的子字符串
    # 得到 urlname == tex-note

    layout=`sed -n ‘2p‘ $file | awk -F : ‘{print $2}‘`
    # 读取文章第二行：layout: blog
    # layout == blog

    layout="$(echo -e "${layout}" | sed -e ‘s/\s//g‘)"
    # 去除layout中的空白字符

    if [ "$layout" != "blog" ]; then
        continue
        # 当layout不为blog时，跳到下一篇文章
    fi

    line=$(sed -n ‘3p‘ $file)
    # 读取文章第三行
    # line == categories: linux

    category=$(echo $line | sed ‘s/.*:\s*//g‘ | sed ‘s/\s//g‘)
    # sed匹配冒号+空白字符后面的字符串。再移除全部的空白字符（非常重要。)
    # category == linux

    targetfile=./${category}/${urlname}.html
    # 拼接目标文件名称
    # targetfile == ./linux/tex-note.html

    targeturl=/$(echo ${basename} | sed ‘s/-/\//g‘ | sed ‘s/\//-/g4‘).html
    # 拼接目标URL：先把全部-替换为/。再把从第四个開始的/替换为-
    # targeturl == /2015/05/02/tex-note.html

    targeturl=$(echo ${targeturl} | sed ‘s/\//\\\//g‘);
    # targeturl
    # targeturl == \/2015\/05\/02\/tex-note

    sed "s/xxx/$targeturl/g" migrate_permalink_tpl.html > $targetfile
    # 将模板migrate_permalink_tpl.html中的xxx替换为targeturl，并存为targetfile
done

附上migrate_permalink_tpl.html：

<!DOCTYPE html>
<html>
<head lang="en">
  <meta charset="UTF-8">
  <meta http-equiv="refresh" content="0; url=xxx">
  <link rel="canonical" href="xxx" />
</head>
<body></body>
</html>

得到的/linux/tex-note.html：

<!DOCTYPE html>
<html>
<head lang="en">
  <meta charset="UTF-8">
  <meta http-equiv="refresh" content="0; url=/2015/05/02/tex-note">
  <link rel="canonical" href="/2015/05/02/tex-note" />
</head>
<body></body>
</html>

除非注明。本博客文章均为原创。转载请以链接形式标明本文地址： http://harttle.com/2015/07/25/bash-file-batch.html

博客迁移：Shell脚本批量文件处理

标签：读取 linu res html 前缀 tom margin order oct

原文地址：http://www.cnblogs.com/ljbguanli/p/7381095.html

踩

(0)

评论一句话评论（0）

分享档案

更多>

2021年07月29日 (22)
2021年07月28日 (40)
2021年07月27日 (32)
2021年07月26日 (79)
2021年07月23日 (29)
2021年07月22日 (30)
2021年07月21日 (42)
2021年07月20日 (16)
2021年07月19日 (90)
2021年07月16日 (35)

周排行