Quantcast
Channel: Ivanovo
Viewing all 83 articles
Browse latest View live

Beware of sync option in mount

0
0

By default mount is using async option, which means that  write operations do not wait for final confirmation from the device – they are stored in disc cache and writes are done latter, optimized by disc firmware.  However you can set sync option manually ( -o sync), then write operations are synchronous, meaning each block write has to wait for confirmation that it’s physically written to the  disc and there is no optimization available.  This can significantly slow down write speed, of which I convinced myself just recently – I backuped  some data to external 2.5″ USB 3.0 HD – slowdown in this case was almost 1000x  –  (70kb/s vs 60MB/s   measured by rsync --progress).  How it happened that disc was mounted with sync option? I actually use usbmount to auto-mount disks and it has sync as default mount option (fortunately can be changed in it’s configuration). So conclusion is – don’t use sync  option unless you know exactly what you are doing and if write speed is suspiciously slow check mount options.


Linux Desktop for 2017 and on

0
0

Screenshot from 2017-10-08 08-38-43As Canonical has announced the end of Unity desktop I thought it’s time to look again around at Linux desktops. In past years I have been using mainly Gnome 2 (or Mate recently), XFCE, Cinnamon and Unity (yes I did and experience was after all rather positive). I’ve tried Gnome 3 few years ago, but really never gave it longer try and never really find attraction for KDE. So in this article I’ll look a bit at those desktops again and especially at the recent Gnome Shell and it’s customization to my needs (which is indeed based on very individual preferences).

For computers with limited resources I’m using XFCE for years and have to say I’m quite happy with it. It’s has quite good look, good usability and it’s wary about resources. Definitely XFCE will remain one of my choices for future.

But if we have more resources we can get more fancy and use more demanding desktop. I’ve used Unity recently with some reservations, but generally worked well. What I particularly liked were  search in the Dash (but of course with disabled stupid commercial and media searches) and the Launcher panel on left hand side. I learned to prefer keyboard for navigation – so effective search is a must, but still liked the Launcher panel if I get lost.

So I was looking for desktop with good search capabilities and possibility to have an intelligent ‘dock’ panel on left.

I know that Mate and Cinnamon are sticking by default to ‘Windowish’ layout with main panel at bottom and ‘Start here’ button with some type of main menu. It’s fine however I think we can move behind that paradigm after all that years, it not mandate neither so brilliant, it’s just a habit . I tested quickly recent Mint with Cinnamon and it did not impress me especially – the search in main menu was rather poor (maybe it can be improved).

I also tried ElementaryOS with default Pantheon desktop. It looks really good and I think it’s excellent choice for a basic user (this might be distro I can install to my wife). I was bit concerned by lack of further customizations and too much ‘Macish’ look.

I tried quickly KDE Plasma too – surely it’s advanced desktop – looks good and is very much customizable, however since I never used KDE systematically, it feels bit strange – I have to admit I’m rather used to Gnome or it’s derivatives.

So final try went to Gnome 3 (or have to say rather Gnome Shell), because I somehow expected I’ll finally end up there. The key difference comparing to years ago experience  is now really tons of extensions for every aspect of the desktop, so Gnome Shell can be now customized to very different look and functionality then it has out of box. Search functionality is quite good and I read that it should be even improved in 3.26.  Concerning look and feel I’ve done following customizations:

  • Changed GTK3 and shell themes to Adapta  and icons to Moka icons (providing nice clean modern look):
    Install:
    sudo apt-add-repository ppa:tista/adapta
    sudo apt-get update
    sudo apt-get install adapta-gtk-theme adapta-backgrounds 
    
    sudo add-apt-repository ppa:moka/daily
    sudo apt-get update
    sudo apt-get install moka-icon-theme faba-icon-theme

    Then set appropriate themes in Gnome Tweak Tool (need to enable User Themes extension before changing shell theme).
  • Change desktop fonts to Noto Sans Regular as recommended  by Adapta theme.
  • In Gnome Tweak tool disable paste by middle mouse button (as I’m using it to scroll) and change acceleration profile to Adaptive
  • Make cursor bigger – in dconf editor go to /org/gnome/desktop/interface/cursor-size  increase to 32
  • Install and enable following gnome extensions:
    • Dash to Dock – excellent extension, displays dash panel in regular screen (much like Unity launcher, but even more customizable).
    • Notifications Alert –  blinks date in top panel, if there are any new notifications. I set blinking period to 0, so it’ll just turn date red, which is enough for me.
    • Places status indicator – quick navigation to key places/directories from top panel.
    • TopIcons Plus – puts legacy tray icons onto top panel (otherwise they are in a tray in bottom left corner, which if terrible).
    • Workspace Indicator – shows current workspace in top panel
    • Refresh wifi connections – adds refresh button to wifi network selection
  • And of course setup favorite background

With these few customizations desktop looks like this:
Screenshot from 2017-10-08 08-38-43

and provides nice looking and hopefully effective working environment.

 

What Is This Weird File Name in My Samba Share?

0
0

In IT there are big things and there are small things. Some small things can be pretty annoying and they seem to stay here forever.  One of these annoying little things is difference between restrictions for file names in Windows versus  unix/linux (others are for instance legacy character encodings, http proxy support, these things has teased me many times in past).  Have you ever seen strange file name like W3NEM5~I on shared disc instead of meaningful file name, that you expected? If so and you’re interested what’s going on continue reading.

So what is the issue actually:  while linux is pretty tolerant for file names –  only two characters  are not allowed:  forward slash / and  null byte \x00 , Windows are much more restrictive – they do not allow any control character below \x20 (32 decimal, space character), but what is worst neither of rather common characters:  < > : ” / \ ? | * is allowed  and if this is not enough there is one more special rule that file names cannot end with space or dot.   One of places where this discrepancy between filesystems is often visible are Samba shares – sharing linux based filesystem across local network. Samba is one of most common ways for running small home NAS.

When Samba is sharing linux filesystem, it has to somehow cope with this problem – that potentially  some valid local file names are not valid for network clients.  Samba handles it with so called file name mangling (here is bit outdated description, in Samba 3 it’s slightly different). Mangling basically means that windows illegal name is transformed to legal name automatically and Samba server maintains mapping between real name and mangled name so client can work with file normally. Downsize of this method is that mangled names are not very informative, matching with original file name only by first letter.

So what we can do if we do not want to see those ugly file names? There are several possibilities, which I’ll list later in the article.  First let’s look at directory in linux server – it contains files with problematic names (last one in first row has space at the end):

wrong-<-test  wrong-|-test  wrong-?-test  wrong-*-test  wrong-test 
wrong->-test  wrong-:-test  wrong-"-test  wrong-\-test  wrong-test.

 

First options indeed is to do nothing.  The discrepancy between file systems is real and this is how to Samba deals with it.  You’ll see files like this (in Nautilus file explorer):
files1

 

Another option is to switch off file names mangling –  you can easy do this in Samba configuration file  /etc/samba/smb.conf (and reload smbd):

[global]
mangled names= no

Behaviour for problematic file names  in this case is ‘undefined’ in linux – gnome vfs show files with dot or space at the end and they can be manipulated, other files are not visible in Nautilus and this is what can be seen in terminal:

$ ls -la /run/user/1000/gvfs/smb-share\:server\=nas\,share\=data_all/tmp/wrong_names/
ls: cannot access '/run/user/1000/gvfs/smb-share:server=nas,share=data_all/tmp/wrong_names/wrong-\-test': No such file or directory
ls: cannot access '/run/user/1000/gvfs/smb-share:server=nas,share=data_all/tmp/wrong_names/wrong-:-test': No such file or directory
ls: cannot access '/run/user/1000/gvfs/smb-share:server=nas,share=data_all/tmp/wrong_names/wrong-"-test': Invalid argument
ls: cannot access '/run/user/1000/gvfs/smb-share:server=nas,share=data_all/tmp/wrong_names/wrong-<-test': Invalid argument
ls: cannot access '/run/user/1000/gvfs/smb-share:server=nas,share=data_all/tmp/wrong_names/wrong->-test': Invalid argument
ls: cannot access '/run/user/1000/gvfs/smb-share:server=nas,share=data_all/tmp/wrong_names/wrong-|-test': Invalid argument
ls: cannot access '/run/user/1000/gvfs/smb-share:server=nas,share=data_all/tmp/wrong_names/wrong-*-test': Invalid argument
ls: cannot access '/run/user/1000/gvfs/smb-share:server=nas,share=data_all/tmp/wrong_names/wrong-?-test': Invalid argument
total 0
drwx------ 1 ivan ivan 0 Oct 15 20:09 .
drwx------ 1 ivan ivan 0 Oct 14 11:55 ..
?????????? ? ?    ?    ?            ? wrong-<-test
?????????? ? ?    ?    ?            ? wrong->-test
?????????? ? ?    ?    ?            ? wrong-|-test
?????????? ? ?    ?    ?            ? wrong-:-test
?????????? ? ?    ?    ?            ? wrong-?-test
?????????? ? ?    ?    ?            ? wrong-"-test
?????????? ? ?    ?    ?            ? wrong-*-test
?????????? ? ?    ?    ?            ? wrong-\-test
-rwx------ 1 ivan ivan 0 Oct 15 20:09 wrong-test 
-rwx------ 1 ivan ivan 0 Oct 14 12:02 wrong-test.

If the share is mounted directly with mount -t cifs then directory looks normal, but files with special characters in name are not accessible:

$ ls
wrong-<-test  wrong->-test  wrong-|-test  wrong-:-test  wrong-?-test  wrong-"-test  wrong-*-test  wrong-\-test  wrong-test   wrong-test.
$ cat 'wrong-<-test' 
cat: 'wrong-<-test': No such file or directory

And what is even worst – if you try edit such file it’s possible, but actually new file is created with modified name. On client new file appears to have same name  in the terminal (and Nautilus displays only one of duplicated files)  – so it’s complete mess:

$ ls 
wrong-<-test  wrong-<-test  wrong->-test  wrong->-test  wrong-|-test  wrong-:-test  wrong-:-test  wrong-?-test  wrong-"-test  wrong-*-test  wrong-*-test  wrong-\-test  wrong-test   wrong-test.

Actually what is happening behind scene is that problematic character is replaced by 3 byte code on the server ( ef 80 a3 in one case, which is UTF-8 code in private code area, so officially it’s no known character) :

$ ls  $'wrong-\xef\x80\xa3-test'
wrong--test

In windows files are visible, but cannot be accessed ( error is  path does not exist).

Another option is to use catia vfs module which is by default bundled with Samba -in can be activated with this addition to smb.conf :

[your-share]
       # share definition
        vfs objects = catia
        catia:mappings = 0x22:0xa8,0x2a:0xa4,0x2f:0xf8,0x3a:0xf7,0x3c:0xab,0x3e:0xbb,0x3f:0xbf,0x5c:0xff,0x7c:0xa6,0x20:0xb1

With this modification windows ‘hostile’ characters are mapped to other characters ( the trick is that mapping has to work in both directions – so best strategy is to use some rare characters to avoid naming conflicts). So names looks like this:
files2

So now most of file names  is much more similar to original file names (only dot at the end of file name is still mangled). The disadvantage would be that it’s harder to write file name manually.

When combined with previous option (switch off mangling), it displays even first one correctly (but in windows it  still cannot be accessed).

Last option is to assure that filenames are consistent with both worlds (basically means consistent with Windows as it is more restrictive).  I personally found this option best, because it assures trouble-free consistency across systems.  Ideally application has to be aware of these restrictions and assure them programatically.   If by accident we create files with wrong names we can use simple script to rename these files (for instance like this one, if we are not particular about possibility of name collisions and eventually overwriting some files, otherwise bit more sophisticated script is needed ):

find . -depth ! -path . -execdir bash -c "echo -n '{}' | sed 's/[\*>\?\"|\\\:<]/-/g' | sed -r 's/\.+$| +$//g'|  xargs -0  mv -b '{}' 2>&1 | grep -v 'same file'" \;

 

 

Splitting Large Audio Books

0
0

I’m big fan of audio books.   During past years I’ve been using setup described in this article (libresonic server, android client, audio encoded with opus codec) for audio books listening.  It works well  for me , but it’s best with audio books split to chapters or to parts not longer then 1 hour. However some audio books come in  one large file (m4b format, or  aax proprietary file  from Audible).  To listen to such audio books conveniently I need to split them. Luckily with ffmpeg tool and a bit of bash scripting it is not difficult.

Audio Books Formats

Basically audio books can use any of available format for digital audio, however following formats are most common:

mp3 – good old MPEG layer 3 is still predominant format for digital audio.  Audio book is usually a directory that contains mp3  files, usually one file per chapter, sometimes split arbitrary to several pieces of same duration. Metadata are in ID3 tags and cover image in either as image file in the directory or as ID3 tag in files.   As this is very informal layout indeed,  it’s usage differs for user to user, company to company. Especially ID3 tags are big mess (as intended originally for music, so they have to be re-purposed for audio books)

m4b (or m4a) –  This is MPEG 4 container with audio encoded with AAC codec.  This format is used by iTunes (m4b is basically equivalent to m4a, b it there just to stress that it is audio book).  Often m4a/m4b is one big file with chapters information in metadata – chapter name, starts and end – and supportive players (like VLC) can show list of chapters and let you skip directly to selected chapter. File also contains metadata tags and cover is usually encoded as additional video stream containing just a jpeg image.

aax – this is proprietary format of Audible (Amazon company and biggest player in commercial audio books in English language). Basically it’s very similar to m4b – it’s MPEG 4 container with AAC LC encoded audio. The main difference is DRM protection – the audio stream is encrypted with 4 bytes key, specific for customer who bought the file.  This means that in regular player like VLC you can see metadata, even can start playback, but will not hear anything (and will see lots of decoding error in terminal output). I would say this DRM protection is rather symbolical now, decryption key can be relatively easily recovered.

Other formats like Vorbis, Opus, WMA  are also possible for audio books but much more rare.

Why Opus?

I tried opus for audio books several years ago. My experiences are summarized in this article and  so far are quite positive. I can see more and more support for opus around and with advancement of AV1 video codec, where opus is supposed to be it’s primarily audio companion, opus will became one of main audio codecs of the future, I believe.

Opus provides very good compression for speech, while retaining good quality.  From my experiences I can use 32kbps or 48kbps bitrates for encoding , while maintaining very good quality of the audio and assuring comfortable listening of the audio books (I’m not such zealous audiophile, I’ve seen a guy claiming he cannot listen to audio book encoded below 192 kbps in MP3, which I consider rather excessive, if you look into details of audio books for Audible, they are encoded with AAC LC 64kbps with sample rate 22050 Hz – which is fairly comparable to opus 32kpbs with 24kHz sample rate concerning audio quality).

So main opus advantage for me is lower bitrate, which is especially appreciated when streaming audio book to mobile over Internet – it assures continuous playback even in areas with lower data speed and of course can have notable impact on mobile bills. And as I’m no media company, it’s enough for me to store in quality suitable for my listening and thus I can also save space on my home server.

Another advantage is that opus is open source,  royalty and patent free, so it can be easily used in any project and we all like open open source, right?

The Script

I’ve created bash script to split big audio books into smaller files encoded with opus audio (script is using ffmpeg and ffprobe):

Usage is pretty straightforward ( run with -h to see help). It can split large m4b/m4a files  into smaller files by chapters (if they are defined in metadata) or to files of fixed duration ( half an hour by default). Split files are stored in subdirectory with same name as the original file. Most time consuming is transcoding of audio – so it’s done in parallel (number of processes is number of cores). Cover image is also extracted to that directory (if possible). It works also for mp3 and aax (if you provide activation bytes) files.

MyBookshelf2 Beta2

0
0

MyBookshelf2 – ebooks management and sharing solution moved to next version – Beta 2 – apart of few small fixes, the main change is an internal revamp to make it compatible with latest Asexor and thus removing dependence on Crossbar.io and Autobahn library.  It makes deployment of MyBookshelf2 easier and removes components that did not add much value to the solution, just made it bit overcomplicated.  Some effort was given to provide easy implementation in Docker – now there is a script that will guide you through two pre-configured scenarios – development (local code, monitoring of code changes) and stage (code in a volume, JS client built and packed, app server running behind nginx proxy with SSL termination, nginx serving static content).

Check latest code on github. To give it a try just clone repo and run init.sh script in deploy directory (assuming you have Docker installed).

Asynchronous Again – Rewriting ptunnel in Rust

0
0

Asynchronous programing model is quite popular for I/0 intensive tasks – it enables you effective use of resources, while maintaining agility of and assuring scalability of the application. I myself used asynchronous programming many times –   in JavaScript (where it’s omnipresent) , Python ( mainly  in asyncio recently, but also bit in twisted, which was one of first network asynchronous libraries I met) and also in OCAML with  lwt or Core Async. The concept is always similar for all implementations –  I/O operations are returning handles to future results – they are called either  Futures, Promises, or Deferred  – and they are returned immediately.  These futures can have functions attached to them, which are executed later, when I/O result becomes available.  Asynchronous  programming is very much about functions, it requires first class functions  and anonymous functions are very useful here, that’s why asynchronous model flourishes in functional languages.  Apart of I/O deferred processing usually there are other utilities for later execution – like timeouts, pausing execution for some time (sleep), tasks synchronization (events, locks). Futures are executed in an “event loop”,   a loop that monitors various events from OS (availability of data from I/O), timers, etc. to execute futures (meaning functions attached to them), when appropriate. It’s also very common to chain futures, executing second one with result of first one , when first one is resolved and result is available and the third one with results from the second one and so on. Apart of this basic scheme languages may provide some syntactic sugar around asynchronous model like await and async keywords in Python or C#, which makes it easier to write the code.

Recently, as I’m progressing in learning of Rust,  I wondered how asynchronous programing is done in Rust. I decided to remake my old project ptunnel (written in Python) into Rust – ptunnel is a program that tunnels arbitrary connection/protocol through HTTPS proxy, so it can be used to connect IMAP, SMTP or SSH through proxy. In the rest of this article I”l share my experiences from this project.

Rust is relatively young language and asynchronous programing in Rust is even younger – now most popular library is Tokio, which is still in version 0.1, so basically in its infancy.  But site contains good documentation and guidance so it’s was relatively easy to get started.

The Long And Winding Road

I have to admit I’m still  struggling with some Rust concepts (lifetime mainly) and Tokio was quite demanding for sticking with core concepts. Tokio is using intensively futures module and all programing  and data processing is done through chaining of futures – this can create very complex types ( especially when anonymous closures are used) and thus provide very cryptic errors ( with type description over half of the screen – reminding me monstrous types in OCAML Ocsigen libraries ). In order to make various complex Future types compatible with rest of the program is quite often necessary to create trait objects – e.g. to box values. There is also one hideous features of rustc,  errors concerning lifetime of variables are reported as last – when all more ‘serious’ errors are resolved. So often it happened to me that I had removed one final error in code only to find that others pop up and I had found that I had screwed lifetime of some variables/parameters and that it couldn’t be easily fixed and had to redesign code.

Generally I noticed that while previously in other languages I was able to start and follow a single, relatively straight path from idea to final code and was able to progress steadily further, here in async Rust I had to backtrack several times, because the way lead nowhere. It looks like there are only few ways how to make it right and if you do not catch them initially you are lost. Hopefully those ways should lead to better code.

Tokio provides nice abstractions for higher level protocols (transports + protocols + services), which I believe can help in other cases,  but here I needed something rather low level –  so finally I ended up with using custom stream and  future types, after learning a bit about Tokio internals ( stressing again good documentation of Tokio) and chaining futures with map (modifying future value), and_then ( creating new future from results of previous one) and or_else (resolving error cases)

One of most attractive features of Rust is ‘zero-costs’ abstraction.  Rust provides higher level language abstractions like algebraic types, iterators etc., which are then implemented in very effective way – basically with same efficiency as if written in lower abstraction level language like C. Tokio abstractions (on lower level, where I was dwelling)  are mainly iterator like traits of Streams and Sinks and future combinators. So after finally being able to cope with these I was wondering how final code will perform and if it’ll match ‘zero-costs’ promise.

Comparison to Python Version

I have to say that old Python program was a poor contender. It’s written in very basic manner and not using any features that could improve performance ( at least thread pool, or asyncio) and it creates (and destroys) two threads for each connection.

So rather then fair benchmark it’s just a toy comparison, just to assure that Rust program is doing better. And it is doing better indeed, much better as we see below.

For quick test I used squid proxy, nginx server and ab (apache benchmark tool), all installed locally on my notebook – sending 1000 HTTP GET requests in 5 concurrent threads. So setup is like this:

ab <-> ptunnel <-> local squid <-> local nginx (default page)

And result is ( average time per request in ms):

Keep-Alive No Keep-Alive
Old Python Tool 10.51 ms 1003.75 ms
New Rust Tool 0.31 ms 3.28 ms
Without ptunnel 1.83 ms 2.55 ms

As you can see old python tool is no match – especially when no keep alive is used and thus overhead of thread creation is preposterous ( keep-alive case Python was fine on majority of requests, but few, where connection was created, time was again around 1 sec due to thread creation overhead).

For comparison there is also scenario where ab connects directly to squid (ptunnel is left out) –   there is interesting  keep-alive case, where setup with ptunnel is faster.

Even if this benchmark is simplistic it still shows that Rust program keeps to its promise and seems to perform well ( direct benchmark between ab and nginx gives 0.22ms per request).

Looking at code base Rust program is about 3 times bigger then old Python program ( ~ 600 lines vs ~ 200 lines).

The Code

The code is available on the github, where kind reader can review it, README.md file contains instructions for installation. New Rust program functionality and CLI copies basically old python program, only new feature available is basic authentication with proxy.

 

CI/CD Environment for A Smaller Project

0
0

Advantages of Continuous Integration (CI) and Continuous Delivery (CD) are obvious even for small projects with few contributors and are easily achievable with help of  free cloud tools – like for instance with mighty combo of Github plus Travis. But what if we want to achieve similarly convenient  environment inside of our private network, available only to our internal teams. Luckily open source is here again to help us with another great tool – GitLab –  GitLab  is a similar platform to GitHub, but the code is open source and we can easily install it in our environment. In this article I’ll summarize my experiences and guidelines how to build convenient environment for a small project  with automatic testing and deployment.

What we what to achieve

Our goals are:

  • Central code repository for our projects with friendly Web interface – similar to GitHub, but hosted by ourselves
  • Developers can use their existing credentials to log into this central repository
  • For projects we can setup CI, where for each push to central repository code is built and tested automatically
  • If build  and tests are successful web application  (we are talking about web applications) in deployed automatically for current branch in the  projects repository, so we can have numerous “feature” versions, where developers, QA and operations can test new features.
  • All this have to be rather simple and be able to run in couple of VMs – as our project is also rather small and we do not need complex architectures line Kubernetes for this project.

Setup

1st VM (Debian, Ubuntu, CentOS …)

  • Size it appropriately 2 cores, 4GB mem 16GB free disk space should be enough for smaller projects.
  • Install here Gitlab – Omnibus package
  • For authentication Gitlab can integrate with your corporate LDAP or through OmniAuth service  enable any popular SSO methods, including SAMLv2 (which I’m using)
  • Create project(s) and push code to them

2nd VM (Debian, Ubuntu, CentOS …)

  • Size it appropriately – as you’ll run there multiple containers you’ll need more resources – say 4 cores, 16GB memory, 80 GB fee disk space (but it depends very much on your projects).
  • Install here Gitlab Runner, Docker CE, eventually Docker Compose
  • Register two Gitlab executors (yes you can register several executors with one runner):
    • Docker executor – will be used for running builds and  tests –  be sure to give it  a tag –  docker
    • Shell executor – will run instances of our application in Docker container – again give it distinguishing tag – shell
  • If you want to share easily links to running application instances install nginx, uwsgi and this small application of mine
  • If you want to use local docker images for test tasks edit docker runner pull policy to pull_policy = "if-not-present"  in file /etc/gitlab-runner/config.toml and restart runner

How to use in a project

Here is the example how to use previous setup in a project ( my project is Python based so no special build tasks were needed, just to test and deploy).   In order to use CI/CD in Gitlab two preconditions are required:

  1. Gitlab must have available runners that can be used for your project (in our case we registered runners for all projects)
  2. .gitlab-ci.yml script in root of your project directory

Here is sample script for our python project:

image: my-base-image-for-testing

variables:
  MY_PREFIX: my_app

test:
  tags:
  - docker
  script:
  - pip install -r requirements.txt
  - pip install nose
  - nosetests

deploy_in_docker:
  stage: "deploy"
  tags:
    - shell
  script:
    - docker build -t ${MY_PREFIX}-${CI_BUILD_REF_NAME} --build-arg BRANCH_NAME=${CI_BUILD_REF_NAME} .
    - docker rm -f ${MY_PREFIX}-${CI_BUILD_REF_NAME} || true
    - docker run -d -P --name ${MY_PREFIX}-${CI_BUILD_REF_NAME} ${MY_PREFIX}-${CI_BUILD_REF_NAME}

 

Now with this setup  each push to central repository will run test and deploy tasks and if they are successful we will have an instance of our application running in container and out team can quickly check new features in that branch.

The Different Approach to A Parser – Parser Combinators

0
0

Some time ago I’ve looked into building parser for simple logical language used by imap_detach Tool. I used there parsimonious module which uses PEG grammars. Recently I learned about another parsing technique – Parser Combinators. The idea comes from functional programming and is about combining (with help of higher order functions – so called combinators) simple parsing functions, which parse primitive tokens, into more complex parsing functions, which parse parts of text and further combine those into more complex ones, ending finally with one function, which parses all given data. I first met with parser combinators in Rust library nom. Here parsers and  combinators are express as macros, which is convenient on one side (concise syntax), but can lead to pretty cryptic error messages and one cannot rely much on editor’s help with auto-completions, context help etc..  I used nom to build very simple URI parser.  I was wondering also if parser combinators are available for Python and how they would compare with other approaches – like above mentioned PEG grammars.

Looking around I found parsec – Python parser combinators library inspired by well know Haskell library.  Parsec is a small library with as much of functional style one can get out of Python.  You can check example for JSON parser.

In my test I wanted to create a parser for same logical expressions as in previous  article – so we can parse and evaluate following expressions into either True or False value (assuming names are replaced with actual values):

name ~=".pdf" & from ~= "jack" | ! attached
name ~=".pdf" & ( from ~= "jack" | from ~= "jim")

 

After playing a bit with the parsec library I started to construct parsers and hit obvious problem (similarly as in initial trials with parsimonious) –  left recursion – though there are some advanced parser combinators that can handle left recursion with help of memoization,  Python parsec is a simple library, so combinators with left recursion have to avoided.  Below is the final code that works similarly to parsimonious grammar:

from parsec import *

parsing_ctx = {}
strict = False


class ContextError(Exception):
    pass


def ctx_lookup(name):
    try:
        return parsing_ctx[name]
    except KeyError:
        if strict:
            raise ContextError("Name %s not is context" % name)
        return ''


def skip(p): return p << spaces()


left_bracket = skip(string('('))
right_bracket = skip(string(')'))
quote = string('"')
or_op = skip(string('|'))
and_op = skip(string('&'))
not_op = skip(string('!'))
contains_cmp = skip(string('~='))
eq_cmp = skip(string('='))
name = skip(regex('[a-z]+')).parsecmap(ctx_lookup)


@skip
@generate
def literal():
    yield quote
    text = yield many(none_of('"'))
    yield quote
    return ''.join(text)


@generate
def bracketed():
    yield left_bracket
    exp = yield expression
    yield right_bracket
    return exp


@generate
def equals():
    var = yield name
    yield eq_cmp
    val = yield literal
    return var == val


@generate
def contains():
    var = yield name
    yield contains_cmp
    val = yield literal
    return var.find(val) > -1


@generate
def not_exp():
    yield not_op
    val = yield simple ^ bracketed
    return not val


ops = or_op | and_op
simple = equals ^ contains ^ name
composed = simple ^ not_exp ^ bracketed


def reduce_ops(val1):
    @Parser
    def rp(text, index):
        curr_values = [val1]
        curr_index = index
        while True:
            res = ops(text, curr_index)
            if not res.status:
                # there is no further operation - stop parsing
                return Value.success(curr_index, any(curr_values))
            else:
                curr_index = res.index
                op = res.value
                res = composed(text, curr_index)
                if res.status:
                    if op =='&':
                        curr_values[-1] = curr_values[-1] and res.value
                    else:
                        curr_values.append(res.value)
                    curr_index = res.index
                else:
                    return res
    return rp


expression = composed.bind(reduce_ops)

Parsers in parsec are functions which take two arguments – index where to start and text to parse – and they return Value type – which contains either parsed value and index where next parser can continue or, in case of error, some description of expected input.  Combinator then takes parser functions as arguments and returns a new parser function, which applies parsers in some useful combination –  look for instance at skip function – it uses combinator skip implemented as  << (overloaded operator ) – it returns value only from first parser and second parser just eats characters from input – in our case non-significant spaces.

Parsec has one very cool feature –  it can turn Python generator into parser. Generator must yields parsers, which are subsequently applied and their value is returned  back to generator – thus final value can be calculated very easily – look for instance to equals functions.

Final piece is function reduce_ops, which creates a parser that parses zero or more following binary operations (ands / ors) and reduces their operands into one single value – first ands as they have higher precedence then ors.

Big advantage of parser combinators is that they are built from bottom up and each parser can be tested independently so you can make unit tests for intermediate parsers to assure that they are working correctly.

I made quick comparison of performance for both parsers – old with parsimonious grammar and new with parsec combinators (1000 iterations of two simple expressions shown above):

Old with grammar compilation in each iteration  4.830 secs
Old with grammar compilation once before iterations 0.563 secs
New 0.536 secs

as you can see in parsimonious grammar compilation takes significant time, but once compiled it is almost as fast as parsec. Concerning code size both examples were similar in size ~ 100 lines of code.

Conclusions? Nothing particular – parser combinators are fun, you can try them yourself, if you like functional programing. If code is reasonably organized they can still provide comprehensible overview of the underlying “grammar”.


Audioserve Audiobooks Server – Stupidly Simple or Simply Stupid?

0
0

If you read some of my previous articles you know that I’m quite fond of audiobooks.   In past I was looking for good media server, which supports audiobooks  and ended with  airsonic (a subsonic clone).   The problem with majority of media servers is that they rely totally on audio tags, which are often messed in audiobooks. Also many of “advanced” functionalities are not applicable to audiobooks (random play, shuffle tracks, moods, etc.)  I’m bit of old school so I rely more on reasonable directory structure and do not want to mess with tags for every audiobook I download. I also think that for personal collection I do not need likes, favorites, sharing and similar stuff,  as I can easily remember which books are good and which I have listened or want to listen, but I do need few things, which are usually missing – like bookmarks. Interesting function is to continue listen on a device, when I left  on previous device, but since I basically listen only on my mobile phone, it does not seems to be critical. So ideal audiobooks server actually requires much less functionality than today’s media servers provide.  As I’m progressing with Rust language I decided two weeks ago to create simple audio streaming server adhering to KISS principle – Keep It Simple, Stupid, – Result of this exercise is an  application that provides minimum viable functionalities for streaming and listening of audiobooks – it’s called Audioserve.  In this article I’ll show it’s basic design and demo current application in a video.

Required functionality

  • Support for common audio formats, especially mp3, opus (I think opus is a great format for audio books), m4b, m4a …
  • HTTP audio streaming
  • SSL/TLS support
  • Some simple authentication mechanism, no need for users etc., as this is a personal server, but something to prevent unauthorized access
  • Provide transcoding to opus to save bandwith and to support more formats
  • Browse audiobooks as they are stored  – e.g. in their directory structure, order alphabetically according to file/dictionary names
  • Support some additional metadata on folder level – cover image and audiobook/author description
  • Some simple search
  • Prefer audiobooks split to chapters, but should be able also to play long files (several hours)
  • Play whole book – automatically start next chapter after current chapter is finished
  • Quick and convenient seek –  quickly find required part in audio file – even for large files
  • Remember last position, to  continue with it listening later, optionally bookmarks to different positions in different audiobooks
  • Simple web based client,  possibly native mobile (Android) client for better cashing, off-line playback
  • Possibility to cache ahead chapters or even whole books – so one can survive connectivity  loss or even  listen off-line
  • Small, lean and fast

Implementation

Audioserve is implemeted in Rust language, it’s using the hyper library for HTTP support, but no complex framework to make it lean. Audioserve provides very simple JSON API to browse the collection and functions to serve audio files directly or transcoded (external ffmpeg utility is required).  Authentication is achieved through shared secret phrase. Audioserve also contains bundled web client for modern browsers (latest Firefox or Chrome).

From above requirements now we do not have good support for caching (only currently played audio file is cached to some extent, but it’s browser dependent,  Firefox seems to provide better caching then Chrome). Reliable caching will require native mobile client, which is not available or in-depth dive into browser APIs, for which I did not have time yet.  Also generic bookmarks are not implemented (thought it  should not be difficult), only one last played position is remembered.

Demo

Installation

Easiest way how to try Audioserve is to use provided Dockerfile and run Docker container. See README in Audioserve repository.

Currently Audioserve is tested only on Linux, but can work on other platforms were Rust code can be compiled.

 

 

 

From Russia with Love – Kotlin Language

0
0

One may think that we have already more then enough programming languages. But community thinks differently and new languages keeps popping up and thanks for that, because some  bring cool innovative features, others focus on more streamlined and effective development, but all of them contribute to the evolution of the IT industry. Because as in the nature progress comes through the variety of species and competition between them.  As I have written in my past article I looked recently into Rust language – very interesting language, which comes with novel approach to memory management without garbage collection, but still assuring high security of the program.  Nowadays I met another member of ever-growing happy family of programing languages – Kotlin and I’d like to share my first experiences in this article.

Kotlin language was engineered in JetBrains – a global leader in IDEs and similar developer tools. Though company is incorporated here in Czech Republic, most of its brainpower is from Russia (so for the title). Name of the language is taken from Kotlin (Ко́тлин) island in the beautiful city of the St.Petersburg.  In some presentation Jetbrain guys said that they were inspired by Java taking it’s name from an island in Indonesia ( which I think is a misconception – because Java language name  is taken from the other meaning of the word – coffee – so more proper name in this sense would be Espresso or Cappuccino).

Actually Kotlin is not so recent language, the development started in 2010 about same time as Swift, however it experienced a major boom in popularity recently just about a year ago when Google announced it’s official support for Android.  Recent 2018 Stackoverflow survey shows that Kolin made it to number 2 in ‘most loved’ languages (just after the Rust).

Authors present their language as “pragmatic” – not focusing very much on innovation, but more on developers productivity and convenience. Kotlin is achieving this by building on Java and extending it with many popular language features found in other languages like Python, Scala, Haskel, Rust, JavaScript etc. In this approach it’s similar to above mentioned Swift, to which it’s often compared.  Popular features in Kotlin are centered around functional programing – lambda functions, higher order functions (which feel more natural here then in Java 8, functions are true first order citizens ) and improved type system with null type safety and type inference. These features alone can help a lot, but Kotlin add many other goodies – more concise syntax, delegation support, possibility to extend existing classes with extension functions to name few notable.

Just to give you a taste of Kotlin, let’s use one of my favorites Kotlin features – extension function:

fun String.capitalizeWords(): String {
    val words = split(' ')
    return words.map{
        it[0].toUpperCase() + it.substring(1)
    }.joinToString(" ")
}

fun main(args: Array<String>) {
	println("sedm lumpu slohlo pumpu".capitalizeWords())
}

Above we define function capitalizeWords, which turns first letter of each word to upper case. This function then can be used on every string.

Key Kotlin strength is 100% compatibility with JVM and Java in both directions: not only you can easily use all existing Java libraries, but also you can use Kotlin code in Java code almost with same ease. And this is strong selling point,  language is not only about it’s syntax, but also very much about core and contributed libraries (which can be seen in other great languages like OCAML, where it’s a bit of problem and limit it’s acceptance). And here Kotlin can profit from extensive and mature Java ecosystem.

Another big advantage of Kotlin is excellent IDE – as being developed by JetBrains it has first class support in IDEA and Android Studio with all the goodies of advanced IDE – completion, code search, refactoring etc.

I’m using Kotlin currently for Android development (which is kind of gloomy endeavour, but this is for another article), where I started from scratch with Kotlin.   It’s fairly easy to learn Kotlin, if you have some Java basics (or similar language like C#). Basics of Kotlin can learned in few days and more advanced usage comes painlessly  later. Language feels quite friendly and one can easily write code in very natural way.  Especially for me, who has been working with mostly Python and recently with Rust, Kotlin syntax feels very convenient and much easier to follow then Java. When learning Kotlin, I found this book very useful – Kotlin in Action by Dmitry Jemerov; Svetlana Isakova from Manning Publications – it’s well written, quickly guides you through key language features, can be read in few days and after that you are ready to start with Kotlin in real project.

Concerning performance: compilation performance looks good, similar to Java (see detailed comparison here). Runtime performance is advertised to be same as Java, however Kotlin requires small runtime library to support all of it’s features. I guess there might be some small runtime overhead in some cases (see for instance this benchmark), but it does not seem to be significant – actually in some cases Kotlin can improve performance because it’s library contains more standard functions, which can replace custom suboptimal code in many cases.

So generally my first impressions from Kotlin are quite positive, language is nice, easy to start with (also thanks to its IDE), many features that I used in other languages are now available here, definitely simplifies Android development.

Stupidly Complex or Completely Stupid – Android Development

0
0

As part of audioserve project I decided to create an Android client. To make this exercise more interesting I decided to write it in Kotlin language (see previous article about Kotlin language).  This client should provide roughly similar interface as web client (see this article for web client interface demo), but I hoped to achieve much better caching features, which will enable to play smoothly even when connectivity is temporarily lost ( for half an  hour, an hour) and possibility to download and play audiobooks completely offline.  Further in this article I’d like to share my experiences with Android development, as beginner in this area.  And spoiler is I do not like it.

I do not enjoy working on user interfaces generally (although I’m not novice in this area, I have worked in many UI frameworks – both native like GTK, Qt, Java based like Swing and web based like Aurelia), but as being a single resource on my fun projects there is no avail, I have to do it.  So I jumped into Android development finally (for my previous projects I tried to avoid mobile platform and provide responsive web interfaces – for Mybookshelf2 it worked fine, the web client is reasonably usable in mobile browser), as reasonable playback experience on mobile does require native app.

Concerning learning there are tons of resources as Android is extremely  popular platform. I chose lectures from Udacity ( this and this course provided by Google engineers/instructors).  Courses combined videos and exercises in Java, which were not difficult to follow and acquired skills were easy to apply in other language – Kotlin. Courses were generally fine,  but  gave me an impression that they were created in hurry – sample exercises have issues in the code and some concepts were  explained hastily. But fine – I went through I started to work on my own application.

First thing I found that I forgot majority of things that I learned – and that was my first issue with Android development – it’s huge topic – there are many concepts to grasp, many APIs, many details that can get wrong.  I have to refresh many things around the way. Crucial question that haunted me during the project is – does really Android application have to be so complex? When looking around it seems that Android development takes at least 30% more time (and money) then iOS – is this only due to diversity of Android platform?  I do not believe so,   I thinks it’s also due to the monstrosity of the platform.

As one example of things which I consider quite complex is the lifetime of the application – especially when you have fragments in you application. It took me while to get everything in order – so that list fragments appears in the correct position after phone rotation, after hiding activity or after going back in the navigation (using back stack of fragments).

Another thing, which is complete madness, are support libraries (libraries to support newer features in previous versions of Android). There are many of them, often classes have same name as in the core platform and if you are unlucky fool like me you can easily mix then around and got very strange errors.

Another thing which unpleasantly surprised me is how much of incorrect, outdated or incomplete information about Android is around.  Official developer’s guide  is fine, but cannot contain every detail ( and it also contains some errors – like this incorrect implementation of singleton in Kotlin – if not still corrected).  However for particular issues I search the web and usually was directed to famous (or should I say infamous in this context?) StackOverflow.  A lot of answers there did not work for me, often answers were totally off – topic (obviously put there by desperate reputation seeking individuals with Indian sounding names).  As one example I can mention this one about back navigation from activity bar. None of answers worked, so I wasted quite some time implementing this rather simple navigation feature.  And even worth experience was to make hyperlink working from TextView in navigation drawer.

Third annoying aspect of development where tools. I used Android Studio on Linux desktop (Ubuntu) and it’s not working well. When I learned that Android Studio is based on IntelliJ’s IDEA, “the most advanced Java IDE”, I was looking forward to enjoy care of this popular IDE. However I experienced quite few issues (structure view often gets out of sync, text editor freezes for a while,  autocomplete stops working, project gets of of synch and requires manual resynch).  Issues were not critical, but made feeling of working in Android Studio less pleasant (I heard that IDEA works better on Windows, but I do not want to change my desktop just for this project).

But to be fair I’m just beginner in Android world and surely many things have very good reasoning behind and Android platform is definitely not limited. You can customize each and every aspect of your application and many things are just necessity of the platform (like restart due to screen rotation, complex lifetime to enable system to manage application efficiently and save battery, support for different versions of the system and different HW, etc.). Maybe it’s just me who’s completely  stupid and struggling with this platform.

Anyhow I crawled through (with many desperate moments, when I was crying aloud “It’s so bad, bad, bad”)  and application (audioserve Android client) is generally working  – you can check the code here.

 

audioserve Android Client Early Beta Is Available

0
0

So finally there is something. I’m using it myself now to listen to audiobooks and it have almost all functionality I wanted it to have. It might be still bit unstable and few things is not well behaving (keep up long time in paused state, navigation between notifications and activities is still bit messy and few more issues), but generally it works. It was a tedious endeavor as I had to learn the Android platform (it was my first real Android app) and in basically took significantly more time then whole server and web client – see previous article for some general comments about Android development.

Should support Android from API version 21 (Lollipop) till latest Oreo (API version 27). Tested on Nougat and Oreo.

As whole audioserve it tries to stick to KISS principles – so it’s up to you to decide if it is Stupidly Simple or Simply Stupid.

Here is video demo:

To check the code and download .apk file go to its github repo (apk files are in releases).  You test you will need audioserver server – which you can download from other github repo (easiest way to try server is in docker, but compilation is also pretty straightforward in Linux)

How much better is the thread pool?

0
0

Is thread pool worth to consider for my project?   I was  looking for some opinions around the net and as usual they  differ and most common wisdom is it matters. Generally it’s “known” that creating and tearing down thread is “significant” overhead, so if you have a lot of small tasks thread pool is much better solution then spawning new thread for each task.  But what is significant overhead?  According to what I read time to create thread on Linux should be about 10μs (which does not look as too much to me) and app. 2MB of memory allocated for stack (configurable).   I was considering thread pool in context of audioserve project, where I started with simplest possible solution (e.g. spawning individual threads ) and was wondering how much I’m loosing by not using thread pool. So I implemented simple thread pool (as learning exercise – long term audioserve solution should use tokio-threadpool)  and add it to audioserve.  In the remainder of this short article I’d like to share my findings and  roughly quantify benefits of thread pool for such small project.

audioserve is basically working with local file systems – so tasks are running is separate threads not to block main loop. I used Apache ab  and Jmeter tools to test 3 scenarios:

  1.  Quick IO task – listing of small directory ( 2 items – subdirectories)  – 1000 concurrent requests with ab tool
    Here I measured response time –  improvement with thread pool was notable – app.  21% decrease ( 75ms vs 95ms).  Notable difference was also on program size – were version with thread pool was about half size in both virtual and residual memory.
  2. Longer IO task – listing of bigger directory ( > 100 files, audioserve must also read audio metadata from each file header) –  1000 concurrent requests with ab tool.
    As expected in longer tasks there was no significant difference in response time (actually version without pool was about 3% faster), but again program with pool took less memory  – about half.
  3. Complex browsing scenario with Jmeter
    Here Jmeter provides compound index – APDEX – reflecting ‘user satisfaction’. Here again there was no difference between versions (version with pool only marginally better –  0.946 vs 0.935), but again there we difference in memory consumed by program, similar as in previous cases.

In summary although user will not mention difference in audioserve, version with pool provides some benefits. mainly in saved memory footprint. I also observed improved stability (version without pool dropped connection occasionally).  As thread pool is  relatively easy to implement it worth to consider even for smaller solution like audioserve.

 

 

From Ignorance to Enlightenment – Playing with Tokio

0
0

I have been playing with tokio already in couple of small projects (ptunnel-rust and indirectly (via hyper) in audioserve), but I cannot say that I’m proficient.  Also tokio is very much moving target –  what I used couple month ago is already bit outdated now(old version is tokio_core crate – where default executor was on current thread, now it’s work stealing thread pool). So I decided to refresh and deepen my knowledge and created a toy project –  stupid jokes server –  it’s a TCP sever, which sends a random  joke to client after it connects and then closes connection. Jokes are stored in text file, separated by dashed lines.  My main interest was to test how to use local file system I/Os, which are blocking by nature, with tokio asynchronous approach (so I initially skipped easiest and probably most efficient implementation, where all jokes would be cached in memory).  Usually in a real project you’ll have some blocking code, so I need to know how to handle it. This article is history of my attempts (and failures) recorded in a hope that it might help others in learning tokio (and also writing it down helped me to absorb gained knowledge).

Prelude

In order to send random jokes from a file, we need to index file first into  Index type

type Index = Vec<(usize, usize)>;

which is just vector of line numbers – start of the joke and end of the joke.  We fill it at server startup in pretty convenient synchronous way:

fn create_index<P: AsRef<Path>>(f: P) -> Result<Index, std::io::Error> {
    let reader = BufReader::new(File::open(f)?);
    let mut start: Option<usize> = None;
    let mut idx = vec![];
    for (no, line) in reader.lines().enumerate() {
        match line {
            Ok(l) => {
                if l.starts_with("---") {
                    if let Some(s) = start {
                        //println!("joke from {} to {}", s, no);
                        idx.push((s, no));
                    }
                    start = Some(no + 1)
                }
            }

            Err(e) => eprintln!("Error reading line {}: {}", no, e),
        }
    }
    Ok(idx)
}

 

Also we should have some code to start tokio runtime – as new runtime is configurable, we use it to start runtime with custom number of threads  and thread life time:

fn create_runtime() -> Result<tokio::runtime::Runtime, io::Error> {
    let mut tp_builder = tokio_threadpool::Builder::new();
    tp_builder
        .name_prefix("ttest-worker-")
        .pool_size(8)
        .keep_alive(Some(Duration::from_secs(60)));

    tokio::runtime::Builder::new()
        .threadpool_builder(tp_builder)
        .build()
}

So main function then looks like this:

fn main() {
    let jokes_file = match env::args().nth(1) {
        Some(s) => s,
        None => {
            eprintln!("text file is required as first argument");
            return;
        }
    };
    let idx = create_index(&jokes_file).unwrap();
    let idx = Arc::new(idx);

    let addr = "127.0.0.1:12345".parse().unwrap();
    let server = prepare_server(addr, jokes_file.into(), idx);

    let mut rt = create_runtime().unwrap();

    rt.spawn(server.map_err(|e| eprintln!("Server error {}", e)));
    rt.shutdown_on_idle().wait().unwrap()
}

So the only thing left is function prepare_server, which will contain server logic – generate random number  smaller then number of jokes and then get staring and ending line numbers from the index and send lines within this range to client – this should be easy peasy, right?

ACT I. – Head First

Ok first try  – read joke, send it :

fn prepare_server(
    addr: std::net::SocketAddr,
    file_name: PathBuf,
    idx: Index,
) -> Box<Future<Item = (), Error = io::Error> + Send> {
    println!("Starting at {}", &addr);
    let tcp = TcpListener::bind(&addr).unwrap();

    let server = tcp.incoming().for_each(move |socket| {
        println!("Received connection from {}", socket.peer_addr().unwrap());
        
        let i = rand::thread_rng().gen_range(0, idx.len());
        let (from, to) = idx[i];
        println!("Sending joke from lines: {} - {}", from, to);
        let reader = BufReader::new(File::open(&file_name).unwrap());
        let joke: Vec<_> = reader
            .lines()
            .skip(from)
            .take(to - from)
            .filter(|r| r.is_ok())
            .map(|s| s.unwrap())
            .filter_map(|l| {
                let s = l.trim_left();
                if s.len() > 1 {
                    Some(s.to_owned())
                   
                } else {
                None
                }
            })
            .collect();

        let mut text = joke.join("\n");
        text.push_str("\n");
        let write_future= io::write_all(socket, text)
        .then(|res| {
            println!("Written joke -result is Ok {:?}",res.is_ok());
            Ok(())
        });

Is it working? Yes – at least appears so – we can try nc localhost 12345 and we will get a joke.

Is it nice? Somehow – notice my nice iterator – I’m getting really used to them.

Some gotchas:

– When returning boxed Future it also needs to mark it with Send trait, otherwise we get error when using it in runtime. Runtime takes only futures that have Send, as it’ll execute them in different threads.

– Initially I though that I’ll need to use Arc wrapper for the index, but it turns out, that it’s not needed , as it can be moved to closure.

– tokio spawn requires future with both Item and Error to be of unit type. However write_all returns Future with different types. So need to chain new future with correct types using then function.

Is it good asynchronous code? Not at all. We are using blocking functions like File::open or .skip,  .lines in asynchronous task. It’s bad. It works because I was lucky – the operations are not blocking for long (especially when file is cached by OS after first reads) and runtime thread pool provides some level of concurrency already, so it’s not so fatal if some threads block. But we can do better, right?

ACT II. Tokio Threadpool Helps

Previous code uses blocking calls – is there anything in tokio that can handle blocking calls?  Actually there are couple possibilities possibilities:

– Spawn blocking code in separate thread, if there are many of such cases, use thread pool.
– Or use asynchronous library, if such is available – and there is tokio-fs library, which enables asynchronous work with files.

I will try both approaches and see were I’ll get. As jokes are supposed to be  distributed is large quantities (as everybody likes stupid jokes) we will probably need a thread pool. There are couple of thread pools implementations around (I also have written simple one to test with audioserve). But wait – tokio provide also thread pool, which is already used to run tokio tasks.  Could not we use it to to run our blocking code?  And it looks like we can – tokio-threadpool provides function blocking, which can surround blocking code and handle it.  So lets try it:

fn prepare_server(
    addr: std::net::SocketAddr,
    file_name: PathBuf,
    idx: Arc<Index>,
) -> Box<Future<Item = (), Error = io::Error> + Send> {
    println!("Starting at {}", &addr);
    let tcp = TcpListener::bind(&addr).unwrap();

    let server = tcp.incoming().for_each(move |socket| {
        println!("Received connection from {}", socket.peer_addr().unwrap());
        let file_name = file_name.clone();
        let idx = idx.clone();
        let work_future = poll_fn(move || {
            blocking(|| {
                let i = rand::thread_rng().gen_range(0, idx.len());
                let (from, to) = idx[i];
                println!("Sending joke from lines: {} - {}", from, to);
                let reader = BufReader::new(File::open(&file_name).unwrap());
                let joke: Vec<_> = reader
                    .lines()
                    .skip(from)
                    .take(to - from)
                    .filter(|r| r.is_ok())
                    .map(|s| s.unwrap())
                    .filter_map(|l| {
                        let s = l.trim_left();
                        if s.len() > 1 {
                            Some(s.to_owned())
                        } else {
                            None
                        }
                    })
                    .collect();

                let mut text = joke.join("\n");
                text.push_str("\n");
                text
            })
        });
        let write_future = work_future
            .map_err(|_| std::io::Error::new(std::io::ErrorKind::Other, "Blocking Error"))
            .and_then(|text| {
                //println!("Joke is {}", text);
                io::write_all(socket, text)
            })
            .then(|res| {
                println!("Written joke -result is Ok {:?}", res.is_ok());
                Ok(())
            });

        tokio::spawn(write_future);

        Ok(())
    });

    Box::new(server)
}

Does it work – yes.

Is it nice code – yes – it’s almost same as the previous one. With couple more lines.

Gotchas:

– Error mapping – we still need to be aware of error types, as chained futures require same Error type.  Here it was easiest to map BlockingError from blocking function to io::Error (it required just one mapping).

blocking function does not actually returns Future, but polling function (more about polling later) – to turn it into Future we need another function poll_fn

– as now we get two closures nested we cannot move some values directly, but have to clone them (otherwise we got error “cannot move out of captured outer variable in an FnMut closure” ).

Is it good asynchronous code? Not yet. Recommended approach for asynchronous code is to split work into many small tasks. So in our case we would like to send text lines as soon as we get them. But we can do that, right?

Act III. Borrow Checker Fights Back

Tokio also provides Streams – they are similar to Iterators, but asynchronous – we already use an iterator, so we can try to turn it into stream. Actually there is a function for it iter_ok, so it should be very easy to implement:

fn prepare_server(
    addr: std::net::SocketAddr,
    file_name: PathBuf,
    idx: Arc<Index>,
) -> Box<Future<Item = (), Error = io::Error> + Send> {
    println!("Starting at {}", &addr);
    let tcp = match TcpListener::bind(&addr) {
        Ok(t) => t,
        Err(e) => return Box::new(err(e)),
    };

    let server = tcp.incoming().for_each(move |socket| {
        println!("Received connection from {}", socket.peer_addr().unwrap());
        let file_name = file_name.clone();
        let idx = idx.clone();
        let lines_future = poll_fn(move || {
            blocking(|| {
                let i = rand::thread_rng().gen_range(0, idx.len());
                let (from, to) = idx[i];
                println!("Sending joke from lines: {} - {}", from, to);
                let reader = BufReader::new(File::open(&file_name).unwrap());
                let joke_iter = reader
                    .lines()
                    .skip(from)
                    .take(to - from)
                    .filter(|r| r.is_ok())
                    .map(|s| s.unwrap())
                    .filter_map(|l| {
                        let s = l.trim_left();
                        if s.len() > 0 {
                            let mut l = s.to_owned();
                            l.push_str("\n");
                            Some(l)
                        } else {
                            None
                        }
                    });

                iter_ok::<_, ()>(joke_iter)
            })
        });

        let write_future = lines_future
            .map_err(|_| ())
            .and_then(move |lines_stream| {
                lines_stream.for_each(move |line|{
                     io::write_all(socket, line)
                     .then(|_| Ok(()))
                })
            })
            .then(|res| {
                println!("Written joke -result is Ok {:?}", res.is_ok());
                Ok(())
            });

        tokio::spawn(write_future);

        Ok(())
    });

Ouch – this code does not compile thanks to the error on line 47  – write_all requires ownership of socket, but it cannot be moved out from captured context. So now what? But we can find some solution, right?

Interlude – Rusty Futures

Tokio is using Rust futures crate heavily, so it’s necessary to have good understanding of how futures works, especially their combinators, so let’s start with them.

Future is a representation of value that will be available later, but we need some reference to it now. Most commonly known “future” is probably Javascript Promise –  it has two basic combinators – then (containing function, which is executed when value becomes available)   and catch (which contains error handling function). Rust contains similar combinators and few more (check futures combinators cheatsheat). Combinators are true essence of asynchronous programing, you create small pieces of code and then glue them together to required logic with appropriate combinators.

Combinators can be used on exiting futures, but sometimes a new future needs to be created from scratch. In Javascript Promise is created by passing function which calls either resolve or reject.  Rust futures are bit different – as explained in the linked article Rust futures are “readiness based” – meaning runtime is waking/polling future,  when  relevant events occurred (bytes comes into socket, timer fires …) and the future decides if it has enough data and can resolve itself, or still needs more.  This decision is done in one function  poll, so to create a future it’s enough to implement this one function – the rest is then provided by the library. The key factor driving this design is “zero-cost-abstraction” requirement of the Rust language. It wants to provide nice higher level features, but they must be implemented very efficiently, basically with same efficiency (speed, memory) as if you’ve written optimized code in C. With this polling approach futures are just states machines and combined, more complex futures are just bit more complex state machines.

Act IV – Hard Way – I Can Do The Future

So in previous Act we have a problem with one particular future – WriteAll (returned from write_all function), which consumed socket, so we cannot use it repeatedly. As explained above creating new future is just about implementing one method – poll, so why not to create a new future, which will solve our problem?

type MyStream = Box<Stream<Item = String, Error = ()> + Send>;
struct Sender {
    stream: MyStream,
    socket: TcpStream,
    buf: Vec<u8>,
    pos: usize,
}

impl Sender {
    fn new(stream: MyStream, socket: TcpStream) -> Self {
        Sender {
            stream: stream,
            socket: socket,
            buf: vec![],
            pos: 0,
        }
    }

    fn write(&mut self) -> futures::Poll<usize, ()> {
        while self.pos < self.buf.len() {
            match self.socket.poll_write(&self.buf[self.pos..]) {
                Err(e) => {
                    eprintln!("Error writing to socket: {}", e);
                    return Err(());
                }
                Ok(Async::NotReady) => return Ok(Async::NotReady),
                Ok(Async::Ready(0)) => {
                    eprintln!("Error write 0 bytes");
                    return Err(());
                }
                Ok(Async::Ready(n)) => self.pos += n,
            };
        }
        Ok(Async::Ready(self.pos))
    }
}

impl Future for Sender {
    type Item = ();
    type Error = ();

    fn poll(&mut self) -> futures::Poll<Self::Item, Self::Error> {
        // write remainder of previous line
        try_ready!(self.write());
        while let Async::Ready(x) = self.stream.poll()? {
            match x {
                Some(l) => {
                    self.buf = l.into_bytes();
                    self.pos = 0;
                    // write what we can
                    try_ready!(self.write());
                }
                None => return Ok(Async::Ready(())),
            }
        }
        Ok(Async::NotReady)
    }
}

The poll function is returning type Poll<T,E> = Result<Async<T>,E>.  Async is enum with two possible values – Ready<T>, which is returned when future is resolved and NotReady, returned when future is still pending and needs to wait for further events.  Tokio runtime then takes care that poll function is called only when events  relevant to pending future (data arrived on socket, chained future resolved, timer triggered etc.). When writing poll function it is crucial to return all NotReady states from incorporated futures’ polling methods. Here we have two – socket asynchronous write (poll_write) and stream’s poll (which behaves very similarly to future’s poll – but returning Ready<Option<T>> – with Some option as long as they are some items in the stream and finally returning None to indicate end of stream.  I separated logic of writing to stream into helper function write, as it’s used in two places –  first when we receive data from steam we try to send as much as we can, before socket would signal NotReady. Then in next call of poll function we need fist send any remaining data. write function is using match statement to handle all possible return values of called poll – this can be simplified with try_ready! macro, which provides value if available, otherwise makes function return error or NotReady.

Ok, so now with this future we can rewrite our server as:

fn prepare_server(
    addr: std::net::SocketAddr,
    file_name: PathBuf,
    idx: Arc<Index>,
) -> Box<Future<Item = (), Error = io::Error> + Send> {
    println!("Starting at {}", &addr);
    let tcp = match TcpListener::bind(&addr) {
        Ok(t) => t,
        Err(e) => return Box::new(err(e)),
    };

    let server = tcp.incoming().for_each(move |socket| {
        println!("Received connection from {}", socket.peer_addr().unwrap());
        let file_name = file_name.clone();
        let idx = idx.clone();
        let lines_future = poll_fn(move || {
            blocking(|| {
                let i = rand::thread_rng().gen_range(0, idx.len());
                let (from, to) = idx[i];
                println!("Sending joke from lines: {} - {}", from, to);
                let reader = BufReader::new(File::open(&file_name).unwrap());
                let joke_iter = reader
                    .lines()
                    .skip(from)
                    .take(to - from)
                    .filter(|r| r.is_ok())
                    .map(|s| s.unwrap())
                    .filter_map(|l| {
                        let s = l.trim_left();
                        if s.len() > 0 {
                            let mut l = s.to_owned();
                            l.push_str("\n");
                            Some(l)
                        } else {
                            None
                        }
                    });

                iter_ok::<_, ()>(joke_iter)
            })
        });

        let write_future = lines_future
            .map_err(|_| eprintln!("Blocking error"))
            .and_then(|lines_stream| Sender::new(Box::new(lines_stream), socket));

        tokio::spawn(write_future);
        Ok(())
    });

    Box::new(server)
}

Is it working? Yes it is.

Is it nice code? Hmm, actually the server function is OKish, similar to previous one, but to solve our problem we had to write a custom code of size almost the same as the function itself. There has to be some better way, it looks like such common task  to send a steam to outgoing socket. I’ll try to ask Rust community, I’ve heard that they are really helpful.

Is it good asynchronous code? I think there is still small  problem – iter_ok function turns iterator into stream, but this stream is always  resolved, it’s never  got interrupted,  so it’ll be processed in one go.  I’d rather like to send lines as they become available in truly asynchronous manner – a lot of small tasks that are executed as soon as possible.

Futures can help here with futures::sync::mpsc::channel  – it’s very similar to channel available in standard library, but  modified to use futures. It sounds promising, let’s try it, right?

Act V – Channels Are Good

So here is our server using channel:

fn prepare_server(
    addr: std::net::SocketAddr,
    file_name: PathBuf,
    idx: Arc<Index>,
) -> Box<Future<Item = (), Error = io::Error> + Send> {
    println!("Starting at {}", &addr);
    let tcp = match TcpListener::bind(&addr) {
        Ok(t) => t,
        Err(e) => return Box::new(err(e)),
    };

    let server = tcp.incoming().for_each(move |socket| {
        println!("Received connection from {}", socket.peer_addr().unwrap());
        let file_name = file_name.clone();
        let idx = idx.clone();
        let (tx,rx) = unbounded::<String>();
        let joker = poll_fn(move || {
            blocking(|| {
                let i = rand::thread_rng().gen_range(0, idx.len());
                let (from, to) = idx[i];
                println!("Sending joke from lines: {} - {}", from, to);
                let reader = BufReader::new(File::open(&file_name).unwrap());
                reader
                    .lines()
                    .skip(from)
                    .take(to - from)
                    .filter(|r| r.is_ok())
                    .map(|s| s.unwrap())
                    .filter_map(|l| {
                        let s = l.trim_left();
                        if s.len() > 0 {
                            let mut l = s.to_owned();
                            l.push_str("\n");
                            Some(l)
                        } else {
                            None
                        }
                    })
                    .for_each(|l| tx.unbounded_send(l).unwrap())
            })
        })
        .map_err(|e| eprintln!("Blocking error: {}", e));

        tokio::spawn(joker);

        let stream = rx
            .map_err(|_| eprintln!("Blocking error"));
        let write_future = Sender::new(Box::new(stream), socket);
        tokio::spawn(write_future);

        Ok(())
    });

    Box::new(server)
}

Code is very similar to the one in previous Act.  But here we created unbounded channel (unbounded to make our life easier, so we can send without blocking. Because jokes have maximally tens of lines, it would be OK here, if  there will be more data, we will have to use bounded channel to prevent extensive use of memory, if other side of the channel is not consuming data quickly enough) and send lines to it.  Reading of lines from files is now separate task and runs in parallel with sending lines to socket, so each line can be sent as soon as it gets read from file.

Does this approach work? Yes, works like charm.

Is it nice code? – same as in previous Act – there is still big overhead with that custom future.  But I might get answer to my question. Yes there is an answer – use lines_stream.forward(socket) – cool.  But wait it does not work – forward requires Sink trait, but socket does not implement it. I will need to read bit more about Streams and Sinks. I should have read tokio’s guide to the end before messing with code.

Is it good asynchronous code? – I think it’s better then previous one, we can achieve better concurrency, as lines are send immediately after they are read.

Interlude – Stream and Sink

The futures crate provides also two other important traits Stream and Sink. We have seen Stream before, it’s an asynchronous equivalent of Iterator and it also provides similar combinators. But stream has to be consumed in an asynchronous way – we have tried for_each method before and even custom future. But there is  much better way – Sink.  As there is AsyncRead  and AsynWrite traits pair on lower level, which works with bytes, then there is Stream and Sink traits pair, where Stream produces stream of items of arbitrary type and Sink then consumes them – all happening asynchronously .

So what I need is a Sink for lines (line represented as String type)  (as my channel is already Stream of lines, channel implements Stream).  To help to implement Stream/Sink pair tokio provides useful support – codec,  object that serialize given type from/to bytes. This article explaines how to create line codec – and it’s also already available in tokio_io::codec::LinesCodec. Once we have this codec our task is very easy – as shown in next  Act.

Act VI – Easy Way – I Framed Sink

With LinesCodec we can write our server as:

fn prepare_server(
    addr: std::net::SocketAddr,
    file_name: PathBuf,
    idx: Arc<Index>,
) -> Box<Future<Item = (), Error = io::Error> + Send> {
    println!("Starting at {}", &addr);
    let tcp = match TcpListener::bind(&addr) {
        Ok(t) => t,
        Err(e) => return Box::new(err(e)),
    };

    let server = tcp.incoming().for_each(move |socket| {
        println!("Received connection from {}", socket.peer_addr().unwrap());
        let file_name = file_name.clone();
        let idx = idx.clone();
        let (tx,rx) = unbounded::<String>();
        let joker = poll_fn(move || {
            blocking(|| {
                let i = rand::thread_rng().gen_range(0, idx.len());
                let (from, to) = idx[i];
                println!("Sending joke from lines: {} - {}", from, to);
                let reader = BufReader::new(File::open(&file_name).unwrap());
                reader
                    .lines()
                    .skip(from)
                    .take(to - from)
                    .filter(|r| r.is_ok())
                    .map(|s| s.unwrap())
                    .filter_map(|l| {
                        let s = l.trim_left();
                        if s.len() > 0 {
                            Some(s.to_owned())
                        } else {
                            None
                        }
                    })
                    .for_each(|l| tx.unbounded_send(l).unwrap())
            })
        })
        .map_err(|e| eprintln!("Blocking error: {}", e));
        tokio::spawn(joker);

        let framed_socket = socket.framed(LinesCodec::new())
        .sink_map_err(|e| eprintln!("Write error {}", e));
        let write_future = rx.forward(framed_socket).
            map(|_| ());
        tokio::spawn(write_future);

        Ok(())
    });

Key trick is to turn socket into Stream/Sink with socket.framed(LinesCodec::new())and then we can use it with forward method.

Is it working? Indeed.

Is it nice code? Oh, yes the Stream/Sink symmetry feels very pleasing.

Is it good asynchronous code? I think yes.

Gotchas:

– Here again I was bit struggling with errors mapping.  Problem was that framed socket implements both Stream and Sink traits – I used map_err function as usual and was surprised that I cannot match errors type from the sink. But for sink there is sink_map_err function rather.

So we done, or not? I promised to show also the other approach, with existing asynchronous fs module, right?

Act VII – All Async Now

With recent version of tokio there is module fs (crate tokio_fs), which is doing exactly what I have tried to do in this article before – read asynchronously from files.  As I understood from blog post internally it also uses blocking function of threadpool. So to complete our journey let’s use fs module (tokio::fs::File).

fn prepare_server(
    addr: std::net::SocketAddr,
    file_name: PathBuf,
    idx: Arc<Index>,
) -> Box<Future<Item = (), Error = io::Error> + Send> {
    println!("Starting at {}", &addr);
    let tcp = match TcpListener::bind(&addr) {
        Ok(t) => t,
        Err(e) => return Box::new(err(e)),
    };

    let server = tcp.incoming().for_each(move |socket| {
        println!("Received connection from {}", socket.peer_addr().unwrap());
        let file_name = file_name.clone();
        let i = rand::thread_rng().gen_range(0, idx.len());
        let (from, to) = idx[i];
        println!("Sending joke from lines: {} - {}", from, to);

        let reader_future = tokio::fs::File::open(file_name)
            .map(move |f| f.framed(LinesCodec::new())
                .skip(from as u64)
                .take((to - from) as u64)
                .filter_map(|l| {
                    let s = l.trim_left();
                    if s.len() > 0 {
                        Some(s.to_owned())
                    } else {
                        None
                    }
                })
                .map_err(|e| eprintln!("Error reading from file: {}",e))
            )
            .map_err(|e| eprintln!("Error opening file: {}",e));

        let framed_socket = socket.framed(LinesCodec::new())
        .sink_map_err(|e| eprintln!("Write error {}", e));
        let write_future = reader_future.and_then(|lines_stream| {
            lines_stream.forward(framed_socket)
            .map(|_| ())
        });
        
        tokio::spawn(write_future);

        Ok(())
    });

    Box::new(server)
}

It’s working, it’s nice, it asynchronous ( just notice how stream is using same combinators as our previous examples, I can just copy this logic here – how nice).

Postlude

So we are at the end, we learned a lot ( at least I did) and it’s time for some conclusions.

Lets start with one general.  I had times when I struggled a bit with rustc, but it aways turned out that it was for a good reason. I tried to do something not so smart and compiler prevent me from making fool of myself. This is my general feeling about Rust so far. While other languages let you approach problem from many possible ways and you almost always can progress on that way and  end up with some code that is working (but code can be bit freakish), Rust often stops you on the way and let you rethink whole problem and start from scratch.

Next comment is about tokio, I already used previous version (tokio_core) and recent version is definitely progress, nicer, richer API. It takes some time to get used to work with futures and it’s bit demanding on discipline in ownership and borrowing and requires good understanding of Rust type system, but once I got used to it,  working with it felt good.

And lastly you might ask, which solution is best?  And my answer would be none of already presented. If we really want efficient server with functionality presented here it will be best to cache jokes in memory. Memory is cheap nowdays and it’ll make solution notably faster. So for completeness here is final solution – we just need to  change Index type to this:

struct Index {
    index: Vec<(usize, usize)>,
    lines: Vec<String>
}

and fill lines (only usefull ones) at the same time when we are building index. Server now looks like this:

fn prepare_server(
    addr: std::net::SocketAddr,
    idx: Index,
) -> Box<Future<Item = (), Error = io::Error> + Send> {
    println!("Starting at {}", &addr);
    let tcp = TcpListener::bind(&addr).unwrap();

    let server = tcp.incoming().for_each(move |socket| {
        println!("Received connection from {}", socket.peer_addr().unwrap());
        let i = rand::thread_rng().gen_range(0, idx.index.len());
        let (from, to) = idx.index[i];
        println!("Sending joke from lines: {} - {}", from, to);
        
        let joke: Vec<_> = idx
            .lines.iter()
            .skip(from)
            .take(to - from)
            .map(|s| s.to_owned())
            .collect();

        let mut text = joke.join("\n");
        text.push_str("\n");
        let write_future= io::write_all(socket, text)
        .map_err(|e| eprintln!("Write error: {}", e))
        .map(|_| ());

        tokio::spawn(write_future);

        Ok(())
    });

    Box::new(server)
}

I have done quick benchmarking of presented solutions. For this I have written minimal client in Rust+tokio, which spawns 1000 concurrent connections (for this I had to modified SOMAXCONN kernel parameter, otherwise strange things happen as explained here), reads lines from socket and closes connection.  And I run it against different versions of the server.

Slowest version is the one from Act I – with 660ms to process all 1000 requests. It’s no surprise as we have done something very bad there – used blocking code directly in asynchronous task. All other solutions, which used blocking function to wrap blocking code, perform better – around 300ms – 330ms. There was no significant difference between them.  Version with tokio_fs, was very slightly slower – 380ms. Variance in measurements was notable so I’d not put too much significance into differences between solutions from Act || – Act VII in terms of performance. The cached solution was clearly fastest, as expected, with 100ms for all 1000 requests.

The code is also on github ( look for commit history for different versions).

 

 

 

 

Fearless Upgrades with Typed Languages

0
0

One of many advantages of statically typed programing languages like Rust or Java is their resilience to changes in dependent libraries, usually caused by new library versions with modified interface – e.g. the major version changes. In statically typed language we usually see problems in compile time and it should be relatively easy to fix, but in dynamic languages such as Python or Javascript upgrade is more challenging and problems  demonstrate themselves during tests (in better case, when we have good test coverage) or in production in worst case. I had recently came through this process for couple of  relatively small projects in Rust. Couple of dependencies (hyper and tokio) had been upgraded to new versions with significant changes to their interface. With update compilation broke with tens of errors, but gradually I was able to fix all of them (in one case it required to improve error handling with custom error type, plus using additional new library, as typed headers were removed form hyper) and after code compiled and run through basic tests I was pretty sure that I’m fine with new libraries. In similar situation in python I would need much more work to be sure that code works after such major upgrade of dependencies. In practice it enables easier maintenance of code and less effort  to keep it up to date with latest libraries. For library authors it provides more freedom and they can introduce breaking changes more often (with great cargo build tool in Rust library users can pin themselves to older version, if they do not want to upgrade).


Tiny Etude in Generics

0
0

Although Rust is mostly noted for it’s memory safety and thus most prominent feature is borrow checker, it has also very decent type system, which was inspired by modern functional languages like OCAML or Haskel. In this article I’ll look into very simple example, which will however show some nice features of Rust type system – especially generics.

So problem is very simple –  create a product of an vector of values (in Rust it would be a slice type, which is more general) in most generic way, so it can be used on broadest possible range of value types ( the problem is of course solve in standard library, but this is meant as a small exercise).

So let’s start with first attempt – we know for sure that our generic type needs to implement Mul trait (to multiply values) :

fn product<T>(vals: &[T])-> T
 where T: std::ops::Mul
    
 {
     assert!(vals.len()>0);
     let mut res = vals[0];
     for x in &vals[1..] {
         res = res * (*x);
     }
     
     res
 }

As often with first attempts in Rust this one even does not compile. There are two issues here:

  • Result of multiplication is associated type of Mul trait, but we are returning type T – we have to be explicit here and link T to Mul Output
  • we cannot just move or deference values out of the slice.  T has to be Copy for this code to work.

So fixed version looks like this:

fn product<T>(vals: &[T])-> T
 where T: std::ops::Mul<Output=T>+Copy
    
 {
     assert!(vals.len()>0);
     let mut res = vals[0];
     for x in &vals[1..] {
         res = res * (*x);
     }
     
     res
 }

So now it works for all Copy types, but it’s still restrictive – we can make it easily work with all Clone types (which is superset of Copy types):

fn product<'a, T>(vals: &'a[T])-> T
 where T: std::ops::Mul<Output=T> + Clone,
    
 {
     assert!(vals.len()>0);
     let mut res = vals[0].clone();
     for x in &vals[1..] {
         res = res * x.clone();
     }
     
     res
 }

However this approach is a bit inefficient, because we need to clone every value. If we look how Mul is implemented for standard numeric types, we see that right hand side could be also a reference to value (&T). This looks promising , as we will not need to clone in the loop. The only other quirk is that lifetime for &T has to be explicit here – &’a T:

fn product<'a, T>(vals: &'a[T])-> T
 where T: std::ops::Mul<&'a T,Output=T> + Clone,
    
 {
     assert!(vals.len()>0);
     let mut res = vals[0].clone();
     for x in &vals[1..] {
         res = res * x;
     }
     
     res
 }

Now we can replace loop with more elegant iterator:

fn product<'a, T>(vals: &'a[T])-> T
 where T: std::ops::Mul<&'a T,Output=T> + Clone,
    
 {
     assert!(vals.len()>0);
     vals[1..].iter().fold(vals[0].clone(), |acc,x| acc*x) 
 }

Now question is if we can get rid of Clone dependency?  Actually we do not need first value of the slice, we can start with multiplicative identity – e.g. 1.   Crate num provides this concept:

extern crate num;
use num::One;
  
fn product<'a, T>(vals: &'a[T])-> T
 where T: std::ops::Mul<&'a T,Output=T> + One,
    
 {
     assert!(vals.len()>0);
     vals.iter().fold(T::one(), |x,y| x*y)
     
 }

It looks nice and clean, but there is one practical problem –  One is not implemented for floating point types (because of quirks of binary FP calculations in real computer).

So finally we can take a look how things are properly implemented in standard library. Iterator trait provides this method:

fn product<P>(self) -> P
        where Self: Sized,
              P: Product<Self::Item>,
    {
        Product::product(self)
    }

So it’s works for all types that implements trait Product:

pub trait Product<A = Self> {
    fn product<I>(iter: I) -> Self
    where
        I: Iterator<Item = A>;
}

And Product trait is implemented for all numerical types (including floating point values) for both value and reference iterators.

So our function could use it as:

use std::iter::Product;
fn product<'a, T>(vals: &'a[T])-> T
 where T: Product<&'a T>,
    
 {
     assert!(vals.len()>0);
     vals.iter().product()
 }

 

Conclusions

Rust type system is quite powerful, which can be seen even in such simple example as is product of vector of values.

Future Never Sleeps

0
0

Recently I’ve been reading this book:  “Network Programming with Rust” by Abhishek Chanda. I found this book bit problematic. It’s just collection of many unrelated examples (often taken from crates documentation), with just little of background and concepts explanation and in some parts this book is just wrong, in other parts it’s using too much simplifications, so the result does not make much sense or worst it introduces some dangerous ideas. One of these  places is part about futures and streams – let’s look at one example:

// ch7/streams/src/main.rs

extern crate futures;
extern crate rand;

use std::{io, thread};
use std::time::Duration;
use futures::stream::Stream;
use futures::{Poll, Async};
use rand::{thread_rng, Rng};
use futures::Future;

// This struct holds the current state and the end condition
// for the stream
#[derive(Debug)]
struct CollatzStream {
    current: u64,
    end: u64,
}

// A constructor to initialize the struct with defaults
impl CollatzStream {
    fn new(start: u64) -> CollatzStream {
        CollatzStream {
            current: start,
            end: 1
        }
    }
}

// Implementation of the Stream trait for our struct
impl Stream for CollatzStream {
    type Item = u64;
    type Error = io::Error;
    fn poll(&mut self) -> Poll<Option<Self::Item>, io::Error> {
        let d = thread_rng().gen_range::<u64>(1, 5);
        thread::sleep(Duration::from_secs(d));
        if self.current % 2 == 0 {
            self.current = self.current / 2;
        } else {
            self.current = 3 * self.current + 1;
        }
        if self.current == self.end {
            Ok(Async::Ready(None))
        } else {
            Ok(Async::Ready(Some(self.current)))
        }
    }
}

fn main() {
    let stream = CollatzStream::new(10);
    let f = stream.for_each(|num| {
        println!("{}", num);
        Ok(())
    });
    f.wait().ok();
}

As you can see on line 37 above the poll method is blocked by thread::sleep – the author explanation is :

“We simulate a delay in returning the result by sleeping for a random amount of time between 1 and 5 seconds.”

But this is wrong, very wrong, going against very basic principle of asynchronous  programing and Futures in Rust.  Because what is happening here is that whole event loop is blocked and no other Future can progress. Future’s poll method should never block, rather it should return Async::NotReady to indicate that it cannot proceed and also schedule next poll for later time, when it expects it could proceed.

So as an exercise I tried to create proper implementation, probably bit simplistic and suboptimal (for proper delay in futures look for instance at tokio-timer crate).

So here is the code:

extern crate futures;
extern crate rand;
#[macro_use]
extern crate lazy_static;

use futures::stream::Stream;
use futures::Future;
use futures::{task, Async, Poll};
use rand::{thread_rng, Rng};
use std::cmp::Ordering;
use std::collections::BinaryHeap;
use std::sync::{Arc, Mutex};
use std::thread;
use std::time::{Duration, Instant};

enum Never {}

struct TaskEntry {
    task: futures::task::Task,
    wake_at: Instant,
}

impl TaskEntry {
    fn new(instant: Instant) -> Self {
        TaskEntry {
            task: task::current(),
            wake_at: instant,
        }
    }
}

impl PartialEq for TaskEntry {
    fn eq(&self, other: &Self) -> bool {
        self.wake_at == other.wake_at
    }
}

impl Eq for TaskEntry {}

impl PartialOrd for TaskEntry {
    fn partial_cmp(&self, other: &Self) -> Option<Ordering> {
        other.wake_at.partial_cmp(&self.wake_at)
    }
}

impl Ord for TaskEntry {
    fn cmp(&self, other: &Self) -> Ordering {
        other.wake_at.cmp(&self.wake_at)
    }
}

enum WaitTime {
    Forever,
    None,
    Some(Duration),
}

struct Waker {
    thread: thread::JoinHandle<()>,
    tasks: Arc<Mutex<BinaryHeap<TaskEntry>>>,
}

lazy_static! {
    static ref GLOBAL_WAKER: Waker = Waker::new();
}

impl Waker {
    fn new() -> Self {
        let tasks = Arc::new(Mutex::new(BinaryHeap::new()));
        let tasks2 = tasks.clone();
        let thread = thread::spawn(move || loop {
            let sleep_time = {
                let mut tasks = tasks2.lock().unwrap();
                let time = match tasks.peek() {
                    None => WaitTime::Forever,
                    Some(TaskEntry { wake_at, .. }) => {
                        let now = Instant::now();
                        if *wake_at > now {
                            WaitTime::Some(*wake_at - now)
                        } else {
                            WaitTime::None
                        }
                    }
                };

                if let WaitTime::None = time {
                    let t = tasks.pop();
                    if let Some(task_entry) = t {
                        task_entry.task.notify()
                    }
                };
                time
            };

            match sleep_time {
                WaitTime::None => (),
                WaitTime::Forever => thread::park(),
                WaitTime::Some(d) => thread::park_timeout(d),
            }
        });

        Waker { tasks, thread }
    }

    fn wake_me_at(&self, time: Instant) {
        let mut tasks = self.tasks.lock().unwrap();
        tasks.push(TaskEntry::new(time));
        self.thread.thread().unpark()
    }
}

#[derive(Debug)]
enum Delay {
    Fixed(u64),
    Random(u64, u64),
}

#[derive(Debug)]
struct Sleeper {
    next_instant: Option<Instant>,
    delay: Delay,
}

impl Sleeper {
    fn new(delay: Delay) -> Self {
        Sleeper {
            next_instant: None,
            delay,
        }
    }

    fn poll_sleep(&mut self) -> Poll<(), Never> {
        let should_wait = match self.next_instant {
            None => {
                let now = Instant::now();
                use Delay::*;
                let delay = match self.delay {
                    Fixed(fixed) => fixed,
                    Random(low, high) => thread_rng().gen_range::<u64>(low, high),
                };
                self.next_instant = Some(now + Duration::from_millis(delay));
                true
            }
            Some(i) => {
                if i <= Instant::now() {
                    self.next_instant.take();
                    false
                } else {
                    true
                }
            }
        };

        if should_wait {
            GLOBAL_WAKER.wake_me_at(self.next_instant.unwrap());
            return Ok(Async::NotReady);
        }

        Ok(Async::Ready(()))
    }
}

macro_rules! poll_sleeper {
    ($sleeper:expr) => {
        match $sleeper.poll_sleep() {
            Ok(Async::NotReady) => return Ok(Async::NotReady),
            Ok(Async::Ready(_)) => (),
            Err(_) => unreachable!(),
        }
    };
}

#[derive(Debug)]
struct CollatzStream {
    current: u64,
    end: u64,
    sleeper: Sleeper,
}

impl CollatzStream {
    fn new(start: u64) -> CollatzStream {
        CollatzStream {
            current: start,
            end: 1,
            sleeper: Sleeper::new(Delay::Random(1000, 5000)),
        }
    }
}

impl Stream for CollatzStream {
    type Item = u64;
    type Error = Never;
    fn poll(&mut self) -> Poll<Option<Self::Item>, Never> {
        poll_sleeper!(self.sleeper);

        if self.current % 2 == 0 {
            self.current = self.current / 2;
        } else {
            self.current = 3 * self.current + 1;
        }
        if self.current == self.end {
            Ok(Async::Ready(None))
        } else {
            Ok(Async::Ready(Some(self.current)))
        }
    }
}

struct Ticker {
    ticks: u64,
    sleeper: Sleeper,
}

impl Ticker {
    fn new(ticks: u64) -> Self {
        Ticker {
            ticks,
            sleeper: Sleeper::new(Delay::Fixed(100)),
        }
    }
}

impl Stream for Ticker {
    type Item = ();
    type Error = Never;

    fn poll(&mut self) -> Poll<Option<Self::Item>, Self::Error> {
        poll_sleeper!(self.sleeper);
        if self.ticks > 0 {
            self.ticks -= 1;
            Ok(Async::Ready(Some(())))
        } else {
            Ok(Async::Ready(None))
        }
    }
}

fn main() {
    let stream = CollatzStream::new(10);
    let f = stream.for_each(|num| {
        println!("{}", num);
        Ok(())
    });
    let t = Ticker::new(100).for_each(|_| {
        println!("Tick");
        Ok(())
    });
    let r = f.join(t);
    r.wait().ok();
}

 

It’s indeed much longer then previous example and majority of code in there is about asynchronous delay.  We have there two streams – CollatzStream – same as in previous example (with random delays before producing value) and then another stream Ticker, which provides unit value every 100 ms – just to demonstrate that both streams are running asynchronously.

Then we have two supporting structures Sleeper and Waker – to enable proper asynchronous sleeps in our streams.  Sleeper sleeps for given delay, returning Async::NotReady when sleeping, then Async::Ready(()) after sleep period and then going to sleep again. Waker registers sleeping futures and wakes them in an appropriate time. Waker is using binary heap to keep references to sleeping futures ordered by their planned wake up time (futures are here represented by their tasks) and runs background thread to wake always nearest task  – schedule it for next poll (with task.notify method).

So as can be seen by comparing original example with fixed one, too much simplification is sometimes dangerous, it can introduce false ideas and confuse the underlying principles rather then to clarify them.

 

Sqlite3 – How Slow Is Write?

0
0

Sqlite3 is lightweight relational database, mainly focused on smaller local systems. Being used in Android it’s now probably most spread relational database in world with billions of instances running. Lite in the name means that it is not client-server architecture and it’s intended for lower data volumes – ideal usage profile is read mostly, with occasional writes. Sqlite3 is often used as an embedded data store in various applications (Firefox and Chrome are most prominent ones). Recently I’ve been playing a bit with sqlite3 interface in Rust and had run couple of simple tests especially focused on writes. So how does sqlite3 performs and how it compares with other more typical client-server RDBMS like PostgreSQL?  It’s not any serious benchmark, just couple of toy tests to highlight few things.

So I started with testing highly concurrent load of independent writes into one table – 10 thousand inserts from independent tasks (running in thread pool and using connection pool) – see code below:

extern crate chrono;
extern crate r2d2;
extern crate r2d2_sqlite;
extern crate threadpool;

use chrono::prelude::*;
use threadpool::ThreadPool;
use std::time::Duration;

#[derive(Debug)]
struct Person {
    id: i32,
    name: String,
    time_created: DateTime<Local>,
    data: Option<Vec<u8>>,
}

fn main() {
    let _ = std::fs::remove_file("my_test_db");
    let manager = r2d2_sqlite::SqliteConnectionManager::file("my_test_db");
    let pool = r2d2::Pool::builder().max_size(5).build(manager).unwrap();

    {
        let conn = pool.get().unwrap();
        assert!(conn.is_autocommit());
        let mut check_stm = conn
            .prepare("SELECT name FROM sqlite_master WHERE type='table' AND name=?1;")
            .unwrap();

        if !check_stm.exists(&[&"person"]).unwrap() {
            //conn.execute_batch("pragma journal_mode=WAL").unwrap();
            conn.execute(
                "CREATE TABLE person (
                  id              INTEGER PRIMARY KEY,
                  name            TEXT NOT NULL,
                  time_created    TEXT NOT NULL,
                  data            BLOB
                  )",
                &[],
            )
            .unwrap();
        }
    }

    let thread_pool = ThreadPool::new(4);

    for i in 0..10_000 {
        let pool = pool.clone();

        thread_pool.execute(move || {
            let me = Person {
                id: 0,
                name: format!("Usak{}", i),
                time_created: Local::now(),
                data: None,
            };
            let conn = pool.get().unwrap();
            conn.busy_timeout(Duration::from_secs(600)).unwrap();
            conn.execute(
                "INSERT INTO person (name, time_created, data)
                  VALUES (?1, ?2, ?3)",
                &[&me.name, &me.time_created, &me.data],
            )
            .unwrap();
        });
    }

    thread_pool.join();
    let conn = pool.get().unwrap();

    let mut stmt = conn
        .prepare("SELECT count(*) FROM person")
        .unwrap();
    let mut query = stmt.query(&[]).unwrap();
    let count: i32 = query.next().unwrap().unwrap().get(0);
    assert_eq!(count, 10_000);
}

As expected performance was very bad. Actually initially many tasks/threads panicked with “busy/database locked” error.  As all writes to sqlite3 database has to be serialized (using lock on db), concurrent threads have been competing for the lock and some were less lucky and timeouted before  acquiring the lock.  Extending conn.busy_timeout helped, but performance was still very poor.  But there was one possibility to improve performance – to use different journal mechanism for slqlite3 database. It’s not so well known than sqlite3 has two mechanism for database journal (journals are used for data consistency and atomic commits) – default is rollback journal, but it also enables write ahead log (WAL) journal. WAL journal can be enabled explicitly by pragma  "pragma journal_mode=WAL" (commented in previous code – see line 31). If we change journal mode to WAL (uncomment line 31), we can see significant improvement in performance, but it’s still pretty slow.

Problem here is sqlite3 is not designed for intensive concurrent write operations, more real live scenario would be that just one thread is filling the data and it inserts/updates many records in single transaction:

extern crate chrono;
extern crate rusqlite;
extern crate threadpool;

use chrono::prelude::*;
use std::time::Duration;
use rusqlite::Connection;
use std::thread;

#[derive(Debug)]
struct Person {
    id: i32,
    name: String,
    time_created: DateTime<Local>,
    data: Option<Vec<u8>>,
}

fn main() {
    let _ = std::fs::remove_file("my_test_db");
    let conn = Connection::open("my_test_db").unwrap();

    {
        assert!(conn.is_autocommit());
        let mut check_stm = conn
            .prepare("SELECT name FROM sqlite_master WHERE type='table' AND name=?1;")
            .unwrap();

        if !check_stm.exists(&[&"person"]).unwrap() {
            conn.execute_batch("pragma journal_mode=WAL").unwrap();
            conn.execute(
                "CREATE TABLE person (
                  id              INTEGER PRIMARY KEY,
                  name            TEXT NOT NULL,
                  time_created    TEXT NOT NULL,
                  data            BLOB
                  )",
                &[],
            )
            .unwrap();
        }
    }

    let thread = thread::spawn(move || {
        let mut conn = Connection::open("my_test_db").unwrap();
        conn.busy_timeout(Duration::from_secs(60)).unwrap();
        let t = conn.transaction().unwrap();

    for i in 0..10_000 { 
            let me = Person {
                id: 0,
                name: format!("Usak{}", i),
                time_created: Local::now(),
                data: None,
            };
            
            t.execute(
                "INSERT INTO person (name, time_created, data)
                  VALUES (?1, ?2, ?3)",
                &[&me.name, &me.time_created, &me.data],
            )
            .unwrap();
        }
        t.commit().unwrap();
    });

    thread.join().unwrap();
    let mut stmt = conn
        .prepare("SELECT count(*) FROM person")
        .unwrap();
    let mut query = stmt.query(&[]).unwrap();
    let count: i32 = query.next().unwrap().unwrap().get(0);
    assert_eq!(count, 10_000);

}

As you can verify above code is at least by order of magnitude faster then previous attempt (see also table below -first vs last column). But what if we forgot to use transaction in this case? By default connection is in auto commit mode, so each insert is committed individually.  Quick check shows that then we are back at lousy performance (see table below – middle column) – basically comparable with writing in concurrent tasks. So it is not so much about concurrency, but commits in sqlite3 are expensive.

And how does sqlite3 compares with fully bodied database like PostgresSQL ( I used basically same code as above, just changed for rust-postgres crate):

/----------------------------------------------------\
|           |10k tasks  |10k inserts |insert 10k recs|
|===========|===========|============|===============|
|pg         | ~ 5.6 secs| ~ 13 secs  | ~ 1.7 secs    |
|sqlite wal | ~ 1.5 mins| ~ 1.5 mins | ~ 0.3 secs    |
|sqlite rb  | ~ 14 mins | ~ 13.5 mins| ~ 0.3 secs    |
\____________________________________________________/

(Only bear in mind that measurements are approximate, informative, done by time command only)

As you can see postgres (first line – pg) is significantly faster for many inserts. You can see that it even can leverage concurrency (more the two times faster when inserts are run in concurrent threads).  But for one transcation (10k inserts in one transaction), sqlite3 is faster, benefiting probably from simple architecture (no client server).

Some conclusions:

  • If you writing a lot of records in sqlite3, try to write them in one transaction – it really helps.
  • Many independent writes performs badly in sqlite3. For write intensive database consider something else.
  • But for many types of applications sqlite3 is very good solution – it’s proven, robust and performs well, when used appropriately.

Audioserve Update

0
0

Some updates have been done on both audioserve server (now in version 0.6.0) and Android client (now in version 0.7.0):

  • on server collection directory structure can be cached in memory for fast search – use option –-search-cache
  • search results also contains folder path (both Android and Web client)
  • Web client –  cached parts of audio in player are displayed as a green bar
  • Android client has now bookmarks (in addition to automatically maintained recently listened) – bookmark  button is revealed by swipe left gesture on the item
  • Advanced playback options – speed and skip silence – pull up the playback controls to see

Just for comparison –   see memory requirements of audioserve vs airsonic:

USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
ivan      2352  0.1 22.2 3115212 845704 ?      Sl   Oct23  65:42 java ...
ivan     13277  0.0  0.5 124108 22136 ?        Sl   Nov04   0:18 /opt/audioserve/audioserve ...

25x less virtual memory,  38x less resident memory.  audioserve is really lightweight!

Next Audioserve Version

0
0

As I’m using audioserve for almost a year, I’m becoming quite keen about it – it’s exactly what I wanted – simple, lightweight and works as I needed. With it listening to audiobooks is just a simple pleasure. Recently I updated audioserve server with couple more features, which might not be essential, but can be useful: multiple transcoding formats  ( meaning target formats) and transcoding cache.

Originally audioserve supported only opus codec and ogg container (which I think is still best choice) so I decided to refactor code to make it simple to use any other appropriate audio format and container (codec and container must be supported by ffmpeg indeed).  And with this refactoring I added also support for ogg in webm container, mp3 and  aac in ADTS container (as they have support in Media Source Extensions (MSE) in modern browsers, while ogg container has not). MSE can be used to create sophisticated web clients. There is one interesting gotcha in our transcoding – result of transcoding is directly send through pipe to HTTP response, stream cannot be than sought so no further updates done to container header, which is problem for many containers – for instance MP4 container cannot be used at all, WebM/Matroska  is missing audio duration, which causes problem in seeking during playback.

Transcoding cache saves successfully transcoded files in LRU cache on disk. You can limit size of cache in respect of overall size and number of files. Cache can be beneficial, if you are often jumping between files  or several users are listening to same audiobooks. But if just one user is listening to audiobooks in rather linear fashion (chapter by chapter, book by book), there is no benefit in using the cache.  That’s why transcoding cache is optional feature in audioserve and you need to translate it with the feature transcoding-cache explicitly enabled.

Above mentioned new features are available in version 0.7.1 and LRU file cache is also available as standalone library simple-file-cache.

Viewing all 83 articles
Browse latest View live




Latest Images