论野生技术&二次元 - 第10页共32页

用 OpenResty 写了一个 SNI 代理

on 2016 年 6 月 6 日

0 16734 转为繁体

推荐 OpenResty 加上 stream 模块和 ngx_stream_lua_module 模块。在 1.9.15.1 上测试通过。

SNI Proxy based on stream-lua-nginx-module
https://github.com/fffonion/lua-resty-sniproxy
18 forks.
86 stars.
2 open issues.

Recent commits:

doc(readme) fix example nginx configuration, Wangchong Zhou
release: 0.22, Wangchong Zhou
fix: proxy protocol on openresty 1.17.8, Wangchong Zhou
release: 0.21, Wangchong Zhou
proxy protocol support, Wangchong Zhou

示例配置：

stream {
    lua_resolver 8.8.8.8;
    init_worker_by_lua_block {
        sni_rules = { 
            ["www.google.com"] = {"www.google.com", 443},
            ["www.facebook.com"] = {"9.8.7.6", 443},
            ["twitter.com"] = {"1.2.3.4"},
            [".+.twitter.com"] = {nil, 443}
        }   
    }

    server {
            error_log /var/log/nginx/sniproxy-error.log error;
            listen 443;
            content_by_lua_block {
                    local sni = require("resty.sniproxy")
                    local sp = sni:new()
                    sp:run()
            }   
    }
}

stream {

lua_resolver 8.8.8.8;

init_worker_by_lua_block {

sni_rules = {

["www.google.com"] = {"www.google.com", 443},

["www.facebook.com"] = {"9.8.7.6", 443},

["twitter.com"] = {"1.2.3.4"},

[".+.twitter.com"] = {nil, 443}

}

server {

error_log /var/log/nginx/sniproxy-error.log error;

listen 443;

content_by_lua_block {

local sni = require("resty.sniproxy")

local sp = sni:new()

sp:run()

}

A Lua table sni_rules should be defined in the init_worker_by_lua_block directive.

The key can be either whole host name or regular expression. Use . for a default host name. If no entry is matched, connection will be closed.

The value is a table containing host name and port. If host is set to nil, the server_name in SNI will be used. If the port is not defined or set to nil, 443 will be used.

Rules are applied with the priority as its occurrence sequence in the table. In the example above, twitter.com will match the third rule rather than the fourth.

If the protocol version is less than TLSv1 (eg. SSLv3, SSLv2), connection will be closed, since SNI extension is not supported in these versions.

那么问题就来了

on 2016 年 5 月 16 日

po主不想分类

6 13566 转为繁体

为什么像我这么多愁善感的人，一个月才写一篇博客呢

升级到Ubuntu16.04，开始接受systemd的调教

on 2016 年 4 月 22 日

Linux

7 21703 转为繁体

sudo do-release-upgrade -d

然后进入看戏模式

OpenVZ

openvz（打满补丁的）内核2.6.32-042stab111.X之前不支持220以上版本的systemd，而16.04用的是229，所以升完之后你会得到一个没有systemd存在的美好世界。

只是因为systemd启动不了，所以开机启动项也都不启动了，你得去serial console里手动设ip和route。所以还是发个tk让客服去升级母鸡内核吧www

udev

system-udev会自动把网卡名字改成奇怪的em0或者ens0什么的，详情见这里

反正systemd说什么都是对的，所以兄弟请干了这碗热巧克力

可以修改/etc/default/grub 的GRUB_CMDLINE_LINUX，改成：

biosdevname=0

就可以继续使用eth0命名了

mysql-apt-config

mysql的官方apt源里还没有支持16.04，而update-manager会尝试将sources.list.d里的源都替换成xenial去更新，所以可能会因为mysql的源没有candidate而报错。

解决办法就是先把/etc/apt/sources.list.d/mysql.list改个扩展名，升完再改回去，然后把里面的trusty改成xenial。这样（mysql支持以后）就可以收到16.04的更新了。

update-manager的编码问题

哈哈哈哈哈哈哈哈哈哈哈哈哈哈我先笑一会

add-apt-repository的时候这个问题就存在，如果ppa源的标题带有奇怪的字符会报错。因为Python3是根据当前LC_ALL来自动选择codec的。

然后update-manager也会死在同一个地方，所以记得先export LC_ALL=posix，再sudo do-release-upgrade -d，再喝茶

Let’s Encrypt集中化管理

on 2016 年 4 月 7 日

9 14816 转为繁体

Let’s Encrypt的证书签发原理实际上和传统的PKI一样，只不过自动化完成了生成CSR和私钥、提交CSR、取回证书的过程。

此外还要验证域名所属，这一部分和传统的签发机构是一样的，不过传统的签发机构还允许我们使用域名whois中填写的邮箱来验证，而Letsencrypt貌似只能通过http challenge的方式来验证。即和验证服务器约定一个uri和随机字符串，验证服务器请求这一uri，如果得到的内容和约定的随机字符串相同，则验证通过。如图所示：

letsencrypt_howitworks

（官网上抄的）

这意味着我们得在每台部署https的前端的负载均衡服务器上都装一个letencrypt工具。有没有什么集中化管理的办法的呢？

实际上，由于challenge的uri的有规律，我们可以将前端服务器收到的这类请求代理到同一台专门用来签发、更新证书的服务器上。如图所示：

letsencrypt_howitworks_proxypass

当在服务器B上发起域名a.example.com新的签发请求后，Let’s Encrypt的签发服务器返回一个challange uri (8303)和response (ed98)。
服务器B使用webroot插件将这个uri和response写入本地磁盘上对应的文件。
Let’s Encrypt的签发服务器为了验证example.com的所属，查询到example.com指向前端服务器A，于是发送一个HTTP请求/.well-known/acme-challenge/8303到服务器A
服务器A反代这一请求到服务器B
B读取刚才第二步时写入到response，返回到A；A返回到Let’s Encrypt的签发服务器
验证成功，发证！

然后，我们只要从服务器A上取回存储在B上到证书就可以了。可以在B上做一个RESTful的api。注意要配置allow和deny。

A服务器（前端）的nginx配置如下：

server {
    # 其他的location
    # location { ..... }

    location ~ /.well-known {
        proxy_pass http://B.example.com:23333;
    }
}

server {

# 其他的location

# location { ..... }

location ~ /.well-known {

proxy_pass http://B.example.com:23333;

}

B服务器的nginx配置如下：

server {
    listen 23333;
    server_name B.example.com;
    location ~ /.well-known {
        root /tmp/letsencrypt;
        allow IP-OF-A;
        deny all;
    }
}

server {

listen 23333;

server_name B.example.com;

location ~ /.well-known {

root /tmp/letsencrypt;

allow IP-OF-A;

deny all;

}

然后在B上运行：

./letsencrypt-auto --webroot -w /tmp/letsencrypt -d exmaple.com;

1	./letsencrypt-auto --webroot -w /tmp/letsencrypt -d exmaple.com;

评论现已支持多种表情包

on 2016 年 3 月 30 日

po主不想分类

16 22729 转为繁体

Screen Shot 2016-03-30 at 7.31.00 AM

当然肯定还有滑稽

让Coreseek支持索引日语假名

on 2016 年 3 月 15 日

C/C++

10 18486 转为繁体

coreseek是一个修改版的sphinx，用mmseg来做中文分词。但是发现一个问题，日语搜索总是效果很差，全部是假名的关键词会返回一个空结果。

开始猜想是不是词库没有包含日语的关系，后来仔细想了一想，mmseg对于没有在词典里的词应该是直接一元分词的，按理说也不应该出现无法索引日语的关系。我们可以通过mmseg命令行工具来证明这一点：

$ /usr/local/mmseg/bin/mmseg -d /usr/local/mmseg/etc/ 1.txt
ヨ/x ス/x ガ/x ノ/x ソ/x ラ/x

1 2	$ /usr/local/mmseg/bin/mmseg -d /usr/local/mmseg/etc/ 1.txt ヨ/x ス/x ガ/x ノ/x ソ/x ラ/x

证明mmseg进行了一元分词。

那么为什么coreseek搜不到假名呢？我找啊找啊终于发现在coreseek使用mmseg进行分词的过程中，对输入字符做了一个过滤，并且有一个注释：

// BEGIN CJK There is no case folding, should do this in remote tokenizer.
// Here just make CJK Charactor will remain. --coreseek
dRemaps.Add ( CSphRemapRange ( 0x4e00, 0x9FFF, 0x4e00 ) );
dRemaps.Add ( CSphRemapRange ( 0xFF00, 0xFFFF, 0xFF00 ) );
dRemaps.Add ( CSphRemapRange ( 0x3040, 0x303F, 0x3040 ) );

// BEGIN CJK There is no case folding, should do this in remote tokenizer.

// Here just make CJK Charactor will remain. --coreseek

dRemaps.Add ( CSphRemapRange ( 0x4e00, 0x9FFF, 0x4e00 ) );

dRemaps.Add ( CSphRemapRange ( 0xFF00, 0xFFFF, 0xFF00 ) );

dRemaps.Add ( CSphRemapRange ( 0x3040, 0x303F, 0x3040 ) );

可见coreseek虽然将CJK (Chinese, Japanese, Korean) 中所有汉字、全角字符和标点加入了范围，但是却漏掉了平假名和片假名。因此我们将第三个range改成0x3000, 0x30FF, 0x3000就可以修正这个问题。

其中：

// 4e00 - 9fff CJK unified ideographs
// 3000 - 303f CJK symbols and punctuation
// 3040 - 30ff Hiragana/Katagana
// ff00 - ffff half/fullwidth forms

// 4e00 - 9fff CJK unified ideographs

// 3000 - 303f CJK symbols and punctuation

// 3040 - 30ff Hiragana/Katagana

// ff00 - ffff half/fullwidth forms

我把修改后的版本放到了github

另外，这里可以查询到Unicode编码范围对应的字符内容；unicode.org有一个database，但是是一个列出了全部字符的大pdf，我似乎没有找到类似的分类。

对于Ubuntu/Debian，这里有编译好的coreseek的deb包：i386 amd64；依赖于mmseg：i386 amd64；mmseg自带的词典

对于>2.2.10的版本，我在这篇博客里提供了完整的补丁，可以应用在sphinx的源码上编译。