Cocoa Oikawa

Rewrite the styled code in HTML generated by Apple to WordPress compatible HTML

My first blog writing was in 2013, and at that time, WordPress was able to handle the styled code correctly, i.e., the code preserved the syntax highlight when I copy it from Xcode / CodeRunner and paste into the WordPress editor. The editor was capable of converting or persevering the colour info, and it did a great job of formatting the styled code into HTML.

Just like this post, https://blog.0xbbc.com/2013/08/assertmacros-problem/. The code shown below

typedef int (*PYStdWriter)(void *, const char *, int);
static PYStdWriter _oldStdWrite;

could be nicely formatted into the corresponding HTML code

<span style="color: #bb2ca2;">typedef</span> <span style="color: #bb2ca2;">int</span> (*PYStdWriter)(<span style="color: #bb2ca2;">void</span> *, <span style="color: #bb2ca2;">const</span> <span style="color: #bb2ca2;">char</span> *, <span style="color: #bb2ca2;">int</span>);
<span style="color: #bb2ca2;">static</span> <span style="color: #4f8187;">PYStdWriter</span> _oldStdWrite;

However, it was about the time WordPress upgraded to 3.9, the aforementioned functionality was removed. Although there are tens of syntax highlighting plugins, but I don't really like the colour schemes they offer. Besides, sometimes I may need to highlight a small portion of code. Such as this post, https://blog.0xbbc.com/2017/05/the-reason-that-codesign-remove-signature-generates-malformed-macho-still-remains-mystery/

/*
* If this has a code signature load command reuse it and just change
* the size of that data.  But do not use the old data.
*/
if(object->code_sig_cmd != NULL){
    if(object->seg_linkedit != NULL){
        object->seg_linkedit->filesize += arch_signs[i].datasize - object->code_sig_cmd->datasize; 
        if(object->seg_linkedit->filesize > object->seg_linkedit->vmsize)

As you can see, using native HTML code could enable extra control and functionality.

继续阅读Rewrite the styled code in HTML generated by Apple to WordPress compatible HTML

Brainfuck Interpreter in C++17——A Modern Approach to Kill Your Brain

It has been a long time that I wrote my last C++ code. And obviously, C++ has released the new standard, C++17, for a few months. There are a lot features in which has been introduced or changed.

One out many is the pattern matching using std::variant, std::monostate and std::visit. This feature could lead the data-oriented design easier in modern C++. And it's not that hard to write a (possibly buggy) brainfuck interpreter with this feature.

Although with that said, we still need some extra 'hacking' to achieve a more elegant code style. There is a blog post by Marius Elvert explaining a two-line visitor in C++, with which you could write a few lambda expressions in one visitor and then the std::variant could be dispatched by the receiving type of those lambda.

auto my_visitor = visitor{
    [&](int value) { /* ... */ }
    [&](std::string const & value) { /* ... */ }
};

To accomplish the goal, Marius found that lager just uses two lines of code. The details are explained in aforementioned blog post. And for the readability of the code today, I just substituted the name of the struct.

And if you're in a hurry, the full code is on my GitHub, https://github.com/BlueCocoa/brainfuck-cpp17

template<class... Ts> struct brainfuck_vm : Ts... { using Ts::operator()...; };
template<class... Ts> brainfuck_vm(Ts...) -> brainfuck_vm<Ts...>;

And the next step is to define all the 8 valid ops of brainfuck.

继续阅读Brainfuck Interpreter in C++17——A Modern Approach to Kill Your Brain

仔细想想还是 Dockerized 吧!

The AI Lab of my mentor was running by me for quite some months. And now it's about time to hand over the docs of the internal server to graduates. Though one of which tends to lose internet connection from time to time due to its location. However, I heart that it had been moved back to university in the middle of July.

And originally, I use Microsoft Word to keep all the records and information of almost everything, but it obviously would cause some issues.

For example, one's docs version may vary from another. Yes, I've thought about to use the cloud storage with version control even. The problem is that we cannot afford the expense of cloud drive. And we could not find someone who's willing to take the charge of reimbursement. The bills have already piled up in my mentor's desk.

Besides that, to use file as docs will inevitably introduce the ugly naming, such as docs-20190807, docs-20190607 or whatever. And it would be totally disaster if to use git for version control. Despite of the unreadable commits, the filename needs to be the same, which extremely likely to be ignored to update from the git repo for some people.

Luckily, there's one instance on AliCloud (Although personally I don't really like AliCloud, but that's another story, let's save it for next time). And lots of packages that can generate static HTML from markdown have been developed these years around.

It would be easy for everyone to access docs online and because the markdown file is pure text, we can have a very good and most important, readable track of changes with git.

The final decision is to use VuePress as the static HTML generator. And to ensure a simple installation process, dockerization is the best shot at the moment. Furthermore, basic HTTP auth is needed to keep unwanted visitors out, leaving the docs only accessible to the lab.

For your convenience, this project is located at my GitHub, https://github.com/BlueCocoa/docs. It's fully prepared and dockerized with docker-compose support.

继续阅读仔细想想还是 Dockerized 吧!

Just for fun: Compile time fibonacci

所以继续摸个鱼,用 C++ 模版编程写个 fibonacci 计算

当然一开始只是顺手玩玩了,因为 C++ 这里的模版匹配其实挺像函数式编程的(似乎还有人曾用 C++ 的模版匹配来做过 SAT solver 的样子)

先把 C++ 编译时的放在下面,一会儿再补一个正经函数式编程的代码,然后再加一个更优雅的函数式实现(Elixir),最后用 Python 和 C++ 再模仿一下函数式的吧~

继续阅读Just for fun: Compile time fibonacci

Have some fun with C++ template programming and compile time string obfuscation

嗯,摸鱼的时候看到了一个 C++ 编译时混淆字符串的实现,urShadow/StringObfuscator. (然后还顺便又玩了一下 C++ 模版编程)

怎么说呢,这样的通过C++模版来现实的编译时混淆其实特征还是相对比较明显的,另有一种也是在编译的时候去混淆,但是则是由编译器/编译器插件来现实的。

至于说性能的话,直观来说对于绝大多数日常应用,两种方式相比不做混淆来讲,也没有可观察到的区别。不过我也没去做benchmark,有兴趣的倒是可以一试。

urShadow/StringObfuscator 使用上比较简单,但相比编译器插件的方式,还是会需要对代码做出一定的修改。

#include <iostream>
#include "str_obfuscator.hpp"

int main(int argc, const char * argv[]) {
    std::cout << cryptor::create("Hello, World!").decrypt() << std::endl;
    return 0;
}

总的来说实现上很简单,很直接,利用 C++ 模版参数取到要混淆的字符串的长度 S与其本体 str。

继续阅读Have some fun with C++ template programming and compile time string obfuscation

多用户 Docker 环境下 PyPi 源按需加速

这一篇算是接在上一篇Build a super fast on demand local PyPi mirror的后面吧~

这里会以 docker-compose 的方式为例子,详细写一下~不使用docker-compose的话,则也仅仅需要手动指定 pypicache 与需要这个服务的 container 到同一个 docker 网络中,这样就可以不用去找 pypicache 的 IP 地址,对最终用户透明化,不用增加额外的 pip 安装参数,即可轻松享受本地高速缓存,特别是对于大一点的文件效果更明显~

jupyterhub-docker
jupyterhub-docker

继续阅读多用户 Docker 环境下 PyPi 源按需加速

Build a super fast on demand local PyPi mirror

  • 当公司/局域网里有多人都使用 Python 开发,并且几乎都会用到 pip 来部署环境时,虽然已经有各种镜像源了,但是下载仍受限于与外网的宽带速度,并且同样的包可能被多人下载了多次,在包较大时,重复花的时间并不值
  • 当你使用 Docker 来构建不同的 Python 应用/环境时,在测试 Dockerfile 时可能需要不断的删掉之前 build 的版本,从头开始 build 时,pip 下载与上面面临同样的问题——重复消耗不必要的时间

其一解决方案是公司/局域网内部搞一个 PyPi 的镜像源,实际上维护一个完整的镜像源相当麻烦,占用的储存空间太大,在公司/局域网的情况下,大家开发的东西、使用的技术栈相对比较固定,这就导致完整的镜像源里会有很多包其实几乎没人用。

其二的解决方案可以是预先构建好一个或多个 Docker 镜像,其中包含大家都会用到的包,剩余的一些包则在使用时才被少数需要的人安装。这种方案的缺点则是目前 Docker 服务 + 多用户方案在重启之后会丢掉已经配置过的环境,重启之后依旧需要从镜像源下载包。

那么这里相对一劳永逸的方案则是搭建一个本地的按需下载的 PyPi 镜像源,其原理则是在镜像源与公司/局域网内增加了一个高速缓存,并且由于 PyPi 已经提交分发的whl或者tar.gz是不会变的,因此不用顾虑缓存时间的设置。

最后就像这样~ 182KB/s VS. 36.4MB/s
(cache server为千兆有线链接,MacBook为802.11 AC,测试时链接速度585Mbps)

It's apparently super fast after being cached!
It's apparently super fast after being cached!

继续阅读Build a super fast on demand local PyPi mirror

A brief tutorial on setup an AI lab server for a small team

这个是在之前导师的实验室积累的一些东西,使用场景的话,是适用于2-8人左右的小团队吧,当时有两台机器,一台是放在学校机房的服务器,CPU没注意是什么,印象中是64G内存,4块P20,貌似24G显存?;另一台机器则放在办公室,主要配置的话,一颗AMD Ryzen 2700X,64G内存,再附加两块1080ti 11G,经费肯定是还做不到一人分一块GPU,部分模型的大小也不需要完全独占一块GPU。但是构建一个小型团队使用的AI Lab服务器是没问题了。

当时搭建的AI Lab服务器的主要架构如下

AI Lab Platform Architecture
AI Lab Platform Architecture

系统方面选择了Ubuntu 18.04 LTS,简单方便,毕竟是做AI不是做OS,没有任何必要引入其他方面复杂的操作。然后在这之上则是系统层面的GPU驱动,当时对应的版本为396.26,目前已经有400版本号的驱动了。接下来就是与docker对接的nvidia的runc,由这个runc去给docker内的GPU提供支持。随后当时则是使用了支持多用户的JupyterHub,当然也可以通过分配多个账号解决,这一部分和之后的部分解决方案就很多了。

继续阅读A brief tutorial on setup an AI lab server for a small team

すごーい!たーのしー!