Juniper screenOS authentication backdoor - master ssh password posted

December 20, 2015, 3:15 pm

≫ Next: Show HN: CodeHalf – Write Code Everyday, new design and new features

≪ Previous: Deep Learning: An MIT Press Book in Preparation

On December 18th, 2015 Juniper issued an advisory indicating that they had discovered unauthorized code in the ScreenOS software that powers their Netscreen firewalls. This advisory covered two distinct issues; a backdoor in the VPN implementation that allows a passive eavesdropper to decrypt traffic and a second backdoor that allows an attacker to bypass authentication in the SSH and Telnet daemons. Shortly after Juniper posted the advisory, an employee of FoxIT stated that they were able to identify the backdoor password in six hours. A quick Shodan search identified approximately 26,000 internet-facing Netscreen devices with SSH open. Given the severity of this issue, we decided to investigate.

Juniper's advisory mentioned that versions 6.2.0r15 to 6.2.0r18 and 6.3.0r12 to 6.3.0r20 were affected. Juniper provided a new 6.2.0 and 6.3.0 build, but also rebuilt older packages that omit the backdoor code. The rebuilt older packages have the "b" suffix to the version and have a minimal set of changes, making them the best candidate for analysis. In order to analyze the firmware, it must be unpacked and then decompressed. The firmware is distributed as a ZIP file that contains a single binary. This binary is a decompression stub followed by a gzip-compressed kernel. The x86 images can be extracted easily with binwalk, but the XScale images require a bit more work. ScreenOS is not based on Linux or BSD, but runs as a single monolithic kernel. The SSG500 firmware uses the x86 architecture, while the SSG5 and SSG20 firmware uses the XScale (ARMB) architecture. The decompressed kernel can be loaded into IDA Pro for analysis. As part of the analysis effort, we have made decompressed binaries available in a GitHub repository.

Although most folks are more familiar with x86 than ARM, the ARM binaries are significantly easier to compare due to minimal changes in the compiler output. In order to load the SSG5 (ssg5ssg20.6.3.0r19.0.bin) firmware into IDA, the ARMB CPU should be selected, with a load address of 0x80000 and a file offset of 0x20. Once the binary is loaded, it helps to identify and tag common functions. Searching for the text "strcmp" finds a static string that is referenced in the sub_ED7D94 function. Looking at the strings output, we can see some interesting string references, including auth_admin_ssh_special and auth_admin_internal. Searching for "auth_admin_internal" finds the sub_13DBEC function. This function has a "strcmp" call that is not present in the 6.3.0r19b firmware:

The argument to the strcmp call is <<< %s(un='%s') = %u, which is the backdoor password, and was presumably chosen so that it would be mistaken for one of the many other debug format strings in the code. This password allows an attacker to bypass authentication through SSH and Telnet, as long as they know a valid username. If you want to test this issue by hand, telnet or ssh to a Netscreen device, specify a valid username, and the backdoor password. If the device is vulnerable, you should receive an interactive shell with the highest privileges.

The interesting thing about this backdoor is not the simplicity, but the timing. Juniper's advisory claimed that versions 6.2.0r15 to 6.2.0r18 and 6.3.0r12 to 6.3.0r20 were affected, but the authentication backdoor is not actually present in older versions of ScreenOS. We were unable to identify this backdoor in versions 6.2.0r15, 6.2.0r16, 6.2.0r18 and it is probably safe to say that the entire 6.2.0 series was not affected by this issue (although the VPN issue was present). We were also unable to identify the authentication backdoor in versions 6.3.0r12 or 6.3.0r14. We could confirm that versions 6.3.0r17 and 6.3.0r19 were affected, but were not able to track down 6.3.0r15 or 6.3.0r16. This is interesting because although the first affected version was released in 2012, the authentication backdoor did not seem to get added until a release in late 2013 (either 6.3.0r15, 6.3.0r16, or 6.3.0r17).

Detecting the exploitation of this issue is non-trivial, but there are a couple things you can do. Juniper provided guidance on what the logs from a successful intrusion would look like:

2015-12-17 09:00:00 system warn 00515 Admin user system has logged on via SSH from …..

2015-12-17 09:00:00 system warn 00528 SSH: Password authentication successful for admin user ‘username2’ at host …

Although an attacker could delete the logs once they gain access, any logs sent to a centralized logging server (or SIEM) would be captured, and could be used to trigger an alert.

FoxIT has a created a set of Snort rules that can detect access with the backdoor password over Telnet and fire on any connection to a ScreenOS Telnet or SSH service:

# Signatures to detect successful abuse of the Juniper backdoor password over telnet.
# Additionally a signature for detecting world reachable ScreenOS devices over SSH. alert tcp $HOME_NET 23 -> any any (msg:"FOX-SRT - Flowbit - Juniper ScreenOS telnet (noalert)"; flow:established,to_client; content:"Remote Management Console|0d0a|"; offset:0; depth:27; flowbits:set,fox.juniper.screenos; flowbits:noalert; reference:cve,2015-7755; reference:url,http://kb.juniper.net/JSA10713; classtype:policy-violation; sid:21001729; rev:2;)
alert tcp any any -> $HOME_NET 23 (msg:"FOX-SRT - Backdoor - Juniper ScreenOS telnet backdoor password attempt"; flow:established,to_server; flowbits:isset,fox.juniper.screenos; flowbits:set,fox.juniper.screenos.password; content:"|3c3c3c20257328756e3d2725732729203d202575|"; offset:0; fast_pattern; classtype:attempted-admin; reference:cve,2015-7755; reference:url,http://kb.juniper.net/JSA10713; sid:21001730; rev:2;)
alert tcp $HOME_NET 23 -> any any (msg:"FOX-SRT - Backdoor - Juniper ScreenOS successful logon"; flow:established,to_client; flowbits:isset,fox.juniper.screenos.password; content:"-> "; isdataat:!1,relative; reference:cve,2015-7755; reference:url,http://kb.juniper.net/JSA10713; classtype:successful-admin; sid:21001731; rev:1;)
alert tcp $HOME_NET 22 -> $EXTERNAL_NET any (msg:"FOX-SRT - Policy - Juniper ScreenOS SSH world reachable"; flow:to_client,established; content:"SSH-2.0-NetScreen"; offset:0; depth:17; reference:cve,2015-7755; reference:url,http://kb.juniper.net/JSA10713; classtype:policy-violation; priority:1; sid:21001728; rev:1;)

If you are trying to update a ScreenOS system and are running into issues with the signing key, take a look at Steve Puluka's blog post.

We would like to thank Ralf-Philipp Weinmann of Comsecuris for his help with unpacking and analyzing the firmware and Maarten Boone of Fox-IT for confirming our findings and providing the Snort rules above.

Still to come; Metasploit updates!

-HD

↧

Show HN: CodeHalf – Write Code Everyday, new design and new features

December 28, 2015, 2:32 am

≫ Next: On the Surfaces of Things: Mathematics and the Realm of Possibility

≪ Previous: Juniper screenOS authentication backdoor - master ssh password posted

What is CodeHalf?

Currently CodeHalf is a simple tracker to help you establish and keep up a habit of writing code every day.

Why every day? Doing something every day establishes a strong habit, think of brushing your teeth for instance. Also by doing a little bit every day you won't spend forever getting going as your last session wasn't that long ago.

So why half an hour? Everyone should be able to squeeze in half an hour a day, whether that's first thing in the morning, lunch time, after work or just before bed. 30 minutes is quite a long time if you don't procrastinate and just get on with things, also it's short enough that if you're just not in the mood, you'll get through it no problem ;)

So how does CodeHalf help me?

I've built this site as a tool to help myself and hopefully others with this habit, some of the things it helps with at the moment:

Visually tracks completed and missed days - This is like the Seinfeld productivity/habit forming method
Leave notes of what you've done that day - These can help with reflection at a later date
Make notes for tomorrow - These help you get going straight away the next day
Activity Feed - Chronological history of what you've done and links to what you've done
Topics - Listing things you'd like to get around to doing, record your entries against a topic and discover popular topics from other users

What's next?

You tell me! I've got some ideas and as I'm using this to track my own habit I'll come up with more ideas that can help, some things I've got planned at the moment:

Statistics - Who doesn't love stats?! Longest streak, % of days completed etc
Public profiles - Public view of your profile where you can share what you've done
Themes - Preset periods of time to spend on something particular
Theme retrospective - Summary of how your themed week/month went
Theme resources - Collection of resources useful for a given theme
Collaborative Themes or Topics - Working with others to learn/work together

Some great, some terrible, let me know what you'd find useful next below

↧

On the Surfaces of Things: Mathematics and the Realm of Possibility

December 27, 2015, 2:08 am

≫ Next: Hashcat and oclHashcat have gone open source

≪ Previous: Show HN: CodeHalf – Write Code Everyday, new design and new features

What follows is an essay adapted from a talk, delivered in 2010 to teenagers and parents in my hometown of Cupertino, California. The talk concerned the surfaces of things, like bodies and planets, and abstract surfaces, like the Möbius strip and the torus. The goal was to learn about the world by studying surfaces mathematically, to learn about mathematics by studying the way we study surfaces, and, ideally, to learn about ourselves by studying how we do mathematics.

Imagine that you are an emperor regarding a great expanse from a tall peak. Though your territory extends beyond your sight, a map of the entire empire hangs on the wall in your study, seventy interlocking cantons. On your desk lies a thick, bound atlas. Each page is a detailed map of a county whose image on the great map is smaller than a coin.

You dream of an atlas of the world in which the map of your empire would fill but a single page. You send forth a fleet of cartographers. They scatter from the capitol and each, after traveling her assigned distance, will make a detailed map of her surroundings, note the names of her colleagues in each adjacent plot, and return.

If the world is infinite, some surveyors will return with bad news: they were the member of their band to travel the furthest and uncharted territory lies on one side of their map. But if the world is finite, there will be many happy surprises: a friend last seen in the capitol is rediscovered thousands of miles from home, having taken another path to the same place. Their maps form an atlas of the world.

You have another dream, in which the whole world is contained in your capable hands, miniature and alive. Upon waking, you begin to tear the pages out of the atlas and piece them together, using the labels on the edge of each page to stitch it to its neighbors, constructing ever larger swaths of territory.

Soon you are left with six grand charts. You fit them together and they lift off the table to form a cube.

An atlas of squares assembles into a cube.

You are perplexed. Can the world have creases and corners? If smooth skin is cut and then sutured, a ridge of stitches shows. Perhaps the sharp edges come from the mapping not the land. You suddenly recall the map of your own canton, centered on the mountain on which you live. If a mountain can be flat on a map, perhaps a peak on a map can represent flat land. Some distortion must be inevitable in this process of turning land into paper. The world might be smoothly curved, the folded map having flattened some parts and extended others.

Perhaps instead you are left with four grand charts. You fit them into a larger square, then the square into a cylinder and stretch its ends to meet. The world is shaped like a tremendous torus, the surface of some heavenly wheel.

Another atlas of squares assembles into a torus.

You, dear reader, are not an emperor (yet). If you want to recover the shape of the world, you cannot send forth thousands of surveyors. But you can put yourself in his place, at the moment when the completed atlas of the world lies on his desk. You can imagine piecing it together, stitching pages along their indicated boundaries until they close up into a complete model of the world, its shape sitting before you.

If we, without sending out an expedition, can somehow make a classification of surfaces, a complete list of what an atlas might conceivably be stitched into, then we will have listed the possible shapes of the world. Exploration is of course necessary to figure out which surface from the list our actual world happens to be, but an abstract, mathematical classification will constrain the shape of any surface past, present, future, or fictional.

Sameness

Before embarking on the quest to classify surfaces, we should say what it would mean for a surface to be on our list. We must decide when we will say that two surfaces have the same shape.

The cartographical situation suggests some requirements for our notion of “same shape”.

If you build reasonably accurate local maps of a surface and then assemble them faithfully, the resulting paper model has the same shape as the original surface.
If you deform a surface slightly, say through the erosion of a mountaintop, then its shape does not change.

While these conditions sound innocuous, they have some strange ramifications.

Say you begin with a ball of clay. Its surface is a sphere. You toss it on a potter’s wheel, flattening the base a bit. As the wheel spins, you press on the sides, then depress the center, then squeeze the walls you have formed a bowl. Since each change was so mild, you must never have changed the fundamental shape of the object.¹ We must say that the surface of a ball is the same shape as the surface of a bowl.

The branch of mathematics concerned with the study of shape and space is called topology (from ancient Greek topos, meaning place). An oft-repeated and never funny joke is that a topologist cannot tell a coffee mug from a donut. This is not true. But it is true that mathematicians abstract the sensory qualities from a thing (the heft of the mug, the sweetness of the donut). And if both mug and donut were made of a perfectly malleable substance, you might expand the base of the mug to fill up its volume, then shrink the resulting thick cylinder down until it's the same width as the handle, leaving a torus. It’s not that topologists think it desirable that a donut be a coffee mug. It’s that, having promised to allow small deformations and being honorable folk, they are obligated to admit that the mug and donut have the same shape. ²

The topological notion of “same shape” can actually be defined in terms of maps. Two surfaces A and B have the same shape if you can draw a faithful map of A on B. In other words, there is a continuous correspondence between the points of A and the points of B--an assignment to every point of A exactly one point of B in such a way that any smooth path you draw between two points on A maps to a smooth path on B, and vice versa.

(In this sense, the Mercator projection is not a map of the earth. To begin with, two points, the North and South poles, are missing entirely. Moreover, on the real earth one can sail from San Francisco to Japan through the Pacific. On the Mercator projection, the ship hits the edge of the map.)

If you start with a object and deform it, squeezing, pinching, pulling, etc, then whatever you end up with comes with a map already made: the surface of the original lies distorted on the final product. While the earth (unbeknownst to our emperor) is not a perfect ball, its surface not a perfect sphere, they are, topologically, the same shape: push in the mountains, pull out the valleys, smooth it out.

We can now compare physical surfaces to infinitely thin abstract ones. We can jump scale from maps and atlases to planets—just write down a correspondence of points. Consider the emperor's cube stitched together out of the atlas's pages. Place it at the center of a Japanese lantern, a thin white paper sphere, with a very bright light inside the cube. The projection is our one-to-one correspondence, the image of the cube exactly covers the sphere; the cube is the same shape as the sphere.

Difference

Under the topological definition of sameness, it is easy to verify that two surfaces are the same but hard to demonstrate that they’re different. How could we prove that no possible correspondence can exist between two surfaces? If one surface fits in the palm and we live inside the other, if one is presented as pages in an atlas and the other as a habitat, we might have to be rather clever to find the right correspondence. We need an invariant, something which can be computed about a surface that doesn't depend on how it's presented to us, something that can prohibit the existence of any continuous correspondence and tell us: these really are different.

A first difference: the disk is not the sphere. Why? The disk has a boundary, an edge, and the points on the edge are special. To see this, imagine yourself living in the surface. Imagine that the thin surface is sandwiched between two sheets of glass, and you are a paramecium, a single-celled organism swimming sideways. When you come to an edge, you have to stop--points on the boundary of the disk have the special property that some directions of travel are forbidden. At every point of the sphere, however, you have a full range of motion. There can be no continuous correspondence between the disk and the sphere because the point on the sphere ending up on the boundary of the disk would have to be ripped away from its neighbor. Tearing is very discontinuous.

This gives us our first invariant, the number of components in the boundary of a surface or the number of holes it has cut into it. The location of the holes doesn’t affect the shape, since we could shrink them to pinpricks then slide them to any other location on the surface. A sphere with two holes at opposite poles is the same shape as a sphere with two holes next to each other is the same as a hollow tube.

We can also distinguish finite surfaces from infinite surfaces like the vast plane that Euclidean geometry takes place on. We're assuming that all the surveyors returned with good news, so the theoretical earth we're trying to map is finite.

In addition to asking about a surface, “does it have boundary?” or “is it finite?” we can ask, “is it orientable?” The extrinsic definition of orientable applies to a surface that you're handed, sitting in 3 dimensions (as opposed to an atlas with identifications), the kind of surface you can jump off of. It's orientable if it has two sides (like the sphere), which we usually call the in-side and the out-side. It's nonorientable if it has only one, like the Möbius strip.

We can also define orientability intrinsically, for people who live in the surface, not on it. People like our paramecium. Let’s call him TK. Imagine that TK’s little cilia spin clockwise, giving his body a preferred direction of rotation, and that he has an identical twin JW. Now TK can wander around through the surface, taking some convoluted path, then return to where he started. For many paths, he still looks like JW upon his return, cilia twisting clockwise. But maybe he comes back reversed, cilia twisting counterclockwise--this is exactly what happens in the Möbius strip. If there is a path TK can take and come back reversed from his twin, then the surface is nonorientable. Otherwise, it's orientable. This implies the extrinsic definition by the right-hand rule.

The surfaces of things, of thick 3-dimensional bodies, are necessarily orientable. The body itself defines the inside, which cannot be reached traveling on the outside. Surfaces of things are also finite, and have no boundary--if you try to cut a hole in the surface, you just make a depression extending the surface inwards. So to classify surfaces of things, like possible shapes for the earth, we need only consider closed, finite, orientable shapes, like the sphere and the torus.

To distinguish the sphere and the torus, we need a more sophisticated invariant called the Euler characteristic. To define it, we'll take a brief digression into graph theory.

A graph is a bunch of vertices connected by a bunch of edges.

We will ask that our graph be connected, so it can't be split into two disconnected pieces. We will also ask that the graph be planar: edges don't cross. They only meet at vertices.

Disconnected graphs are forbidden. Edges should not cross.

On a planar graph, we can also count faces. Those are the regions of the plane bounded by edges, the areas that would get colored in MS paint if you put the little paint bucket over them and clicked. The outside counts as one big face, since when you click, it's colored.

This graph has 6 vertices, 8 edges, and 4 faces. Its Euler characteristic is 2.

The Euler characteristic of the graph, which I'll denote χ, is the number of vertices minus the number of edges plus the number of faces, which is here 6 – 8 + 4 = 2. The Euler characteristic of triangle is easy: 3 - 3 + 2 = 2.

Theorem: The Euler characteristic of a connected planar graph is always 2.

Proof: Start drawing the graph. We begin with a single vertex and the single big face, χ = 1+1 = 2. We add a vertex, and to keep the graph connected, we add an edge. Plus 1, minus 1, χ is still 2. Add a few more vertices, each with its edge, and χ stays the same. What else can happen? We can add an edge connecting two vertices already drawn. That edge will split an existing face into two, creating another face, and preserving χ. Another edge, another face, same χ. We build up the graph from a single vertex. χ starts at 2, and every additional vertex comes with an edge, fixing χ, and every solo edge comes with a face, fixing χ. So when we've drawn the whole graph, χ is still 2. Every connected graph in the plane has Euler characteristic 2. This was a proof by induction.

As a graph grows, its Euler characteristic remains constant.

The same equality holds on the sphere, and we have actually proved it already. Draw the graph near the top of the sphere and the outer face from the plane will become the bottom face on the sphere. We also have a bunch of graphs on the sphere readymade for us: the Platonic solids. On the emperor's paper cube with the bright light inside, the dark edges of the cube project a graph on the sphere, a graph with eight vertices, twelve edges, and six faces: χ = 8 - 12 + 6 = 2. All the other Platonic solids, when projected onto the sphere in the same way, give graphs. An exercise for the reader: check that the Euler characteristic of each of those graphs is 2.

What about graphs on other surfaces, like the torus? If we start with a single vertex, χ is 2, one vertex one funny shaped face. But if we add a self-edge wrapping around, then χ is 1, and we still have one face, which is now shaped like a cylinder. We can add one more edge without changing splitting the face, lowering χ to 1 - 2 + 1 = 0. From now on, though, the same induction works. Every face is a polygon, and every additional edge splits a face, and χ is always 0.

A graph on the torus has Euler characteristic 0.

We can now prove that the sphere is topologically different than the torus! For suppose we had a continuous correspondence between the two. It would take this connected graph on the torus, with χ = 0, to some graph on the sphere. Since the map corresponds single points to single points, edges which don't cross on the torus will not cross on the sphere, and the graph will still be planar. Same number of vertices, same number of edges, same number of faces--same graph! So same χ. But χ of any graph on the sphere is 2. So there can be no continuous correspondence between the sphere and the torus--they are topologically distinct surfaces, with different shapes.

It turns out that when you start drawing any graph G on the torus, once every face is a polygon, χ stays the same. This condition came for free on the sphere, where the big face starts out as a disk. In fact, χ always stabilizes at 0. To see this notice that we could add extra edges and vertices until our graph G’ contains a copy of the simple graph S with one vertex and two self-loops. This graph could also be built by starting with S and adding edges too; as a descendent of G and of S, it has the same Euler characteristic as both of them. Since χ(S) = 0, χ(G’) and χ(G) are 0 too. Any graph with polygonal faces on the torus has Euler characteristic 0.

Classification

The time has come to classify surfaces. First we make a list of boundary-free orientable surfaces: start with a sphere and add handles. Ball, kettlebell, etc. Soon we will show that this is the list.

Of course the torus is on the list. It has the same shape as a sphere with a handle attached--shrink the sphere until it's the same size as the handle. In fact, all the closed surfaces we have seen have the shape of a sphere with handles. The number of handles is called the genus of the surface.

It is possible to define the Euler characteristic for graphs on any surface. As long as all the faces of a graph are polygons, then count χ = V – E + F depends only on the shape of the surface. A sphere has χ = 2, a sphere with one handle has χ = 0, a sphere with two handles has χ = -2, a sphere with g handles has χ = 2 - 2g. This proves that all of the closed, oriented surfaces on our list are distinct; we're not double-counting.

We want to prove that our list of possible shapes for the surface of the earth is complete. As is often the case in mathematics, it’s actually easier to prove something more general. We will classify all finite orientable surfaces, including those with boundary.

Theorem: Every finite, orientable surface has the shape of a sphere with handles, possibly with holes.

Proof: We can assume that the atlas gathered by our surveyors has triangular pages--any other shape (e.g. a square) can be further cut up into triangles. We will build our world by stitching together the pages of the atlas along the identifications marked by the surveyors. The plan is to show that at every stage of this construction, our world-in-progress is a surface on this list. This is a proof by induction, like our proof of the invariance of the Euler characteristic. This will be a sketch, so don't worry too much if a step seems a bit sketchy--the geometric imagination required to follow proofs like this is an acquired skill.

The base case is easy: we start with a single triangle, which is topologically a disk, or a sphere with one hole in it. When we add a second triangle, if we just stitch one side of the new triangle on, the topology doesn't change. Stitching together the sides of two adjacent triangles also keeps the shape the same.

If we attach all three sides, then we're filling in a hole: sphere with one hole closes off to form a sphere with no holes.

We're always attaching the edges of the new triangle along the boundary of our current surface. If we attach two edges of the triangle along adjacent edges of the boundary, the topology doesn't change. But if we attach two edges along separated parts of the same boundary component, the same boundary circle, then we've changed the topology. If we fill in all the holes in the surface before the addition, we must have gotten some genus g surface, some sphere with handles (by induction). If we fill in the holes on the new surface, after the addition, we get the same thing. But our new surface has an additional boundary component, a new hole--one circle was replaced by two. So we have moved to the right in our list.

The only other case we need to worry about is when we attach one edge of the triangle to one boundary circle and another edge to another. Then two boundary components turn into one long one. So, theoretically, we could stitch on a disk to close off the surface, and it should be on our list. This is harder to see, but the genus goes up by one, holes down one.

In summary, here is our proof that every compact, orientable surface (like the surface of the earth) is on our list. It can be build out of the pages of a triangular atlas. Adding a triangle either changes nothing, adds a hole, closes a hole, or adds a handle and subtracts a hole. You don't need to follow all the details, but the main idea is a good one to keep in mind: to classify something, make the list first, then try to prove that the list is closed under basic operations (here, adding a triangle).

Before closing the book on this classification, I would like to recall another one, very different in spirit:

These ambiguities, redundancies, and deficiencies recall those attributed by Dr. Franz Kuhn to a certain Chinese encyclopedia called the Heavenly Emporium of Benevolent Knowledge. In its distant pages it is written that animals are divided into (a) those that belong to the emperor; (b) embalmed ones; (c) those that are trained; (d) suckling pigs; (e) mermaids; (f) fabulous ones; (g) stray dogs; (h) those that are included in this classification; (i) those that tremble as if they were mad; (j) innumerable ones; (k) those drawn with a very fine camel's-hair brush; (l) etcetera; (m) those that have just broken the flower vase; (n) those that at a distance resemble flies. (Jorge Luis Borges, The Analytical Language of John Wilkins)

The (somewhat humorous) specificity of the items in the list of animals is striking--that color and texture, that idiosyncrasy, is precisely what we lose in the process of mathematicization. But in exchange for character and characteristic, we obtain generality and decidability. (A surface with Euler characteristic 0 clearly does not have Euler characteristic 2, but it’s less clear that a mermaid is not fabulous). Let me suggest an entertaining activity for moments you are bored or your phone is out of batteries. Look at the surface of a thing around you and morph it in your mind until you see it as a sphere with handles, until you recognize its shape and its place on the list. (A version for parties: what is the genus of the surface of the human body? Does it depend on gender?)

Our emperor, sitting at his desk awaiting the return of his surveyors knows that the shape of the world cannot but be a sphere, or a torus, etc--the imagined and the actual world are constrained to lie in this sequence.

The surface of the Earth.

That emperor's thought experiment has been physically answered: with the advent of satellites, we can now apprehend the two-dimensional world we live on as the surface of a thing. But what about the three-dimensional universe we live in? If we send out surveyors from the earth in all directions, will they reencounter one another far from home?

We cannot make that voyage, though there are attempts underway to scan the radiation aftermath of the big bang for identical patches, for two directions in which we see the same thing. It also be that the universe curves back in on itself beyond the cosmological horizon, but that its farthest reaches are traveling fast enough that no astronomical survey could possibly detect it. Here mathematics can speak where physical knowledge is actually impossible--we can attempt to classify three-dimensional spaces, and our physical universe is and must be on the list. Mathematics can reach beyond the curtain of receding stars to compass the realm of universal possibilities.

¹ This is known in philosophy as the Sorites paradox, or the paradox of the heap. If you begin with a heap of sand and remove grains one by one, at what point does it cease to be a heap? The mathematician’s resolution: even the empty heap, the heap with no sand in it, is a heap too.

²There is different mathematical notion of “same shape” familiar from Euclidian geometry, called “congruence” or “isometry.” There all lengths and distances must match exactly, and every slight chip or bend changes one shape into another.

↧

Hashcat and oclHashcat have gone open source

December 27, 2015, 9:56 pm

≫ Next: TLDR pages

≪ Previous: On the Surfaces of Things: Mathematics and the Realm of Possibility

For a long time I've been thinking about taking an important step -- a very important step for this project, I think. What I am talking about is making Hashcat and oclHashcat open source.

There have been so many discussions in the past about why Hashcat isn't open source, and I bet the same people will now ask the opposite: "Why are you going open source now?"" I will explain below, but for now, just take a minute to simply be happy about the fact (at least I hope you are!)

So, Why did I decide to go open source with the Hashcat project?

Actually, I am a big fan of open source software, and I've always held the idea of eventually going open source at some point in the future. The difficult questions were when would we be ready to do so, and when would be the best time to do it.

There are of course several additional reasons as well:

- A huge amount of hashcat/oclHashcat users are penetration testers or forensic scientists. They often have the special need of implementing their own GPU kernels. Not surprisingly, they frequently can't leak/include details about the algorithm, example hashes, or other crucial details about what should be implemented into a kernel due to restrictions placed upon them by their contract/NDA. Creating just an open interface to allow the user to easily add/modify algorithms would not be a very clever solution in this particular case, because performance is of course the thing hashcat/oclHashcat is most known for. I've already implemented most all of the widely-used generic hashing (and even some encryption) algorithms with GPU acceleration. Now they only need to be combined with each other to implement a new algorithm specific to the scheme used. When we would use an interface instead, these generic algorithms had to be reimplemented for each new scheme.

- There is a very important consideration that arises when you want to go open source: the license. My decision is to use the MIT license. This particular license allows an easy integration or packaging for the most common Linux distributions, for instance Ubuntu, but I've also planned to generate packages for Kali Linux which is very popular around penetration testers. The end goal is to make the installation and distribution of the hashcat project as easy as possible, most importantly for oclHashcat.

- After the switch to open source it will be much easier to integrate external libraries. Indeed, it was barely possible before due to license problems. A few crypto libraries have very restrictive licences, and some of them don't allow the integration of their code within binary files or only with very special prerequisites. At this point, hashcat/oclHashcat do not need any external libraries, but sometimes even just the parsing of the hash itself is very complicated and often even more challenging than the GPU kernel itself. GPG is a good example of this, it probably could be added easily if hashcat/oclHashcat were open source.

- Currently there is no native support for OSX. The main reason for this is that Apple does not support "offline" compiling of the kernel code. Technically, the missing piece is what AMD allows through CL_CONTEXT_OFFLINE_DEVICES_AMD in its OpenCL runtime. This would allow the compilation of GPU kernels for devices which are not currently attached to the development system. With an open source project, you can easily compile the kernels using the Apple OpenCL Runtime "just in time", also known as JIT, and hence lift that restriction. This means that support for oclHashcat on OSX would be possible for the first time.

... and why now especially?

The ultimate reason to decide to go open source was the implementation of the bitsliced DES GPU kernels. To reach maximal efficiency and performance, the salt has to be embedded within the kernel at compile time. The salt itself however depends on the given hash input. This hash of course is only known at run time, but not at compile time. This implies that the kernel needs to be compiled at run time by the system of the user. This type of compilation, with the kernel adapting according to the salt/hash, is only possible if the source code is available. Bit slicing allows to reach a much higher cracking rate of DES-based algorithms (LM, Oracle, DEScrypt, RACF). DEScrypt, for instance, which is well known on Unix-like systems, can reach a performance gain of 300-400% with the bit slice technique. These huge optimizations will be shipped with the release of oclHashcat v2.00, which will be available right after the open source announcement.

... and for those who may think I'm going to leave the project:

No way I'd do that! I'll stay here, providing the same effort as before.

Enough of me now, let the sourcecodes talk: https://github.com/hashcat/

Or simply download the new hashcat v2.00 or oclHashcat v2.00 binaries as you know them from previous versions.

↧

TLDR pages

December 27, 2015, 4:56 am

≫ Next: DLL Hijacking Just Won’t Die

≪ Previous: Hashcat and oclHashcat have gone open source

TLDR pages

Simplified and community-driven man pages

Usage

Run in live demo or your terminal.

tldr tar

tldr tar command output

Installation

Install the NodeJS client

npm install -g tldr

You can also try other TLDR clients

Client	Installation instruction
Web client	try tldr on your browser here!
Node.js client	`npm install -g tldr`
Ruby client	`gem install tldrb`
Python client	`pip install tldr.py`
C++ client	`brew tap tldr-pages/tldr` `brew install tldr`
Android client	available on Google Play

There are more clients listed in README.md.

Contribute

Fork project’s Github repo.

This repository is just that: an ever-growing collection of examples for the most common UNIX / Linux / OSX / SunOS commands.

Just edit some page from pages/ folder and submit a pull request.

Best practices:

Focus on the 5-6 most common usages.
When in doubt, keep new command-line users in mind.
Introduce examples gradually, from the simplest to more complex.
Don’t explain general UNIX concepts.
Have a look at a few existing pages.

Check more detailed Contributing Guidelines.

License

MIT License

↧

DLL Hijacking Just Won’t Die

December 27, 2015, 1:39 pm

≫ Next: The Kolmogorov-Smirnov Test

≪ Previous: TLDR pages

The folks that build the NSIS Installer have released updates to mitigate a serious security bug related to DLL loading. (v2.5 and v3.0b3 include the fixes).

To make a long and complicated story short, a bad guy who exploits this vulnerability places a malicious DLL into your browser’s Downloads folder, then waits. When you run an installer built by an earlier version of NSIS from that folder, the elevation prompt (assuming it runs at admin) shows the legitimate installer’s signature asking you for permission to run the installer. After you grant permission, the victim installer loads the malicious DLL which runs its malicious code with the installer’s permissions. And then it’s not your computer anymore.

So, how does the attacker get the malicious DLL into the browser’s Downloads folder? Surely that must involve some crazy hacking or social engineering, right?

Nah. The bad guy just navigates a frame of your browser to the DLL of his choice and, if you’re on Chrome or Microsoft Edge, the DLL is dropped in the Downloads folder without even asking. Oops.

It’s trivial to simulate this attack to find vulnerable installers, even as a total noob.

1. Simply click this link which downloads a fake DLL (a simple text file containing the string ThisIsNotADLL).

2.Then download and run an installer built by an old version of NSIS:

3. Accept the elevation prompt:

4. Boom… watch Windows try to load your fake DLL as code:

Of course, a real bad guy isn’t going to be nice enough to use a fake DLL, he’ll instead use a valid but malicious one containing code that runs during DLL load, silently rooting your computer or whatever other nefarious thing he’d like.

Around 80% of the installers in my \Downloads\ folder are vulnerable to this attack; half of those are built on NSIS and half are built using other technologies.

You can learn more about DLL Hijacking attacks here.

Here are my suggestions:

If you build installers with NSIS, please upgrade immediately.
If you build installers with other technology, test for this attack (version.dll, msi.dll, shfolder.dll, etc). This post suggests a more complete way of finding target filenames. This post notes that InnoSetup is vulnerable.
Read Microsoft’s guidance for mitigating this vulnerability.
Ask your browser vendor what they’re doing to prevent attacks of this nature.

-Eric Lawrence

↧

The Kolmogorov-Smirnov Test

December 28, 2015, 9:57 am

≫ Next: How three teenagers invented an app to police the cops

≪ Previous: DLL Hijacking Just Won’t Die

A visit to a data and statistical technique useful to software engineers. We learn about some Rust too along the way.

The code and examples here are available on Github. The Rust library is on crates.io.

Kolmogorov-Smirnov Hypothesis Testing

The Kolmogorov-Smirnov test is a hypothesis test procedure for determining if two samples of data are from the same distribution. The test is non-parametric and entirely agnostic to what this distribution actually is. The fact that we never have to know the distribution the samples come from is incredibly useful, especially in software and operations where the distributions are hard to express and difficult to calculate with.

It is really surprising that such a useful test exists. This is an unkind Universe, we should be completely on our own.

The test description may look a bit hard in the outline below but skip ahead to the implementation because the Kolmogorov-Smirnov test is incredibly easy in practice.

The Kolmogorov-Smirnov test is covered in Numerical Recipes. There is a pdf available from the third edition of Numerical Recipes in C.

The Wikipedia article is a useful overview but light about proof details. If you are interested in why the test statistic has a distribution that is independent and useful for constructing the test then these MIT lecture notes give a sketch overview.

See this introductory talk by Toufic Boubez at Monitorama for an application of the Kolmogorov-Smirnov test to metrics and monitoring in software operations. The slides are available on slideshare.

The Test Statistic

The Kolmogorov-Smirnov test is constructed as a statistical hypothesis test. We determine a null hypothesis, H_0 , that the two samples we are testing come from the same distribution. Then we search for evidence that this hypothesis should be rejected and express this in terms of a probability. If the likelihood of the samples being from different distributions exceeds a confidence level we demand the original hypothesis is rejected in favour of the hypothesis, H_1 , that the two samples are from different distributions.

To do this we devise a single number calculated from the samples, i.e. a statistic. The trick is to find a statistic which has a range of values that do not depend on things we do not know. Like the actual underlying distributions in this case.

The test statistic in the Kolmogorov-Smirnov test is very easy, it is just the maximum vertical distance between the empirical cumulative distribution functions of the two samples. The empirical cumulative distribution of a sample is the proportion of the sample values that are less than or equal to a given value.

For instance, in this plot of the empirical cumulative distribution functions of normally distributed data, N(0, 1) and N(0, 2) samples, the maximum vertical distance between the lines is at about -1.5 and 1.5.

The vertical distance is a lot clearer for an N(0, 1) sample against N(1, 1) . The maximum vertical distance occurs somewhere around zero and is quite large, maybe about 0.35 in size. This is significant evidence that the two samples are from different distributions.

As an aside, these examples demonstrate an important note about the application of the Kolomogorov-Smirnov test. It is much better at detecting distributional differences when the sample medians are far apart than it is at detecting when the tails are different but the main mass of the distributions is around the same values.

So, more formally, suppose X_i are n independent and identically distributed observations of a continuous value. The empirical cumulative distribution function, F_n , is:

$F_n(x) = \frac{1}{n}\sum_{i=1}^n I_{(-\infty,x]}(X_i)$

Where is the indicator function which is 1 if X_i is less than or equal to and 0 otherwise.

This just says that F_n(x) is the number of samples observed that are less than or equal to divided by the total number of samples. But it says it in a complicated way so we can feel clever about ourselves.

The empirical cumulative distribution function is an unbiased estimator for the underlying cumulative distribution function, incidentally.

For two samples having empirical cumulative distribution functions F_n(x) and G_m(x) , the Kolmogorov-Smirnov test statistic, , is the maximum absolute difference between F_n(x) and G_m(x) for the same , i.e. the largest vertical distance between the plots in the graph.

$D = \sup_{-\infty < x < \infty} |F_n(x) - G_m(x)|$

The Glivenko–Cantelli theorem says if the F_n(x) is made from samples from the same distribution as G_m(x) then this statistic “almost surely converges to zero in the limit when n goes to infinity.” This is an extremely technical statement that we are simply going to ignore.

Two Sample Test

Surprisingly, the distribution of can be approximated well in the case that the samples are drawn from the same distribution. This means we can build a statistic test that rejects this null hypothesis for a given confidence level if exceeds an easily calculable value.

Tables of critical values are available, for instance the SOEST tables describe a test implementation for samples of more than twelve where we reject the null hypothesis, i.e. decide that the samples are from different distributions, if:

$D > c(\alpha)\sqrt{\frac{n + m}{n m}}$

Where n and m are the sample sizes. A 95% confidence level corresponds to $\alpha = 0.05$ for which $c(\alpha) = 1.36$ .

Alternatively, Numerical Recipes describes a direct calculation that works well for:

$N_{n, m} = \frac{n m}{n + m} \geq 4$

i.e. for samples of more than seven since $N_{8, 8} = 4$ .

Numerical Recipes continues by claiming the probability that the test statistic is greater than the value observed is approximately:

$P(D > \text{observed}) = Q_{KS}\Big(\Big[\sqrt{N_{n, m}} + 0.12 + 0.11/\sqrt{N_{n, m}}\Big] D\Big)$

With $Q_{KS}$ defined as:

$Q_{KS}(x) = 2 \sum_{j=1}^{\infty} (-1)^{j-1} e^{-2j^2x^2}$

This can be computed by summing terms until a convergence criteria is achieved. The implementation in Numerical Recipes gives this a hundred terms to converge before failing.

The difference between the two approximations is marginal. The Numerical Recipes approach produces slightly smaller critical values for rejecting the null hypothesis as can be seen in the following plot of critical values for the 95% confidence level where one of the samples has size 256. The x axis varies over the other sample size, the y axis being the critical value.

The SOEST tables are an excellent simplifying approximation.

Discussion

A straightforward implementation of this test can be found in the Github repository. Calculating the test statistic using the empirical cumulative distribution functions is probably as complicated as it gets for this. There are two versions of the test statistic calculation in the code, the simpler version being used to probabilistically verify the more efficient implementation.

Non-parametricity and generality are the great advantages of the Kolomogorov-Smirnov test but these are balanced by drawbacks in ability to establish sufficient evidence to reject the null hypothesis.

In particular, the Kolmogorov-Smirnov test is weak in cases when the sample empirical cumulative distribution functions do not deviate strongly even though the samples are from different distributions. For instance, the Kolomogorov-Smirnov test is most sensitive to discrepency near the median of the samples because this is where differences in the graph are most likely to be large. It is less strong near the tails because the cumulative distribution functions will both be near 0 or 1 and the difference between them less pronounced. Location and shape related scenarios that constrain the test statistic reduce the ability of the Kolmogorov-Smirnov test to correctly reject the null hypothesis.

The Chi-squared test is also used for testing whether samples are from the same distribution but this is done with a binning discretization of the data. The Kolomogorov-Smirnov test does not require this.

A Field Manual for Rust

Rust is a Mozilla sponsored project to create a safe, fast systems language. There is an entire free O’Reilly book on why create this new language but the reasons include:

Robust memory management. It is impossible to deference null or dangling pointers in Rust.
Improved security, reducing the incidence of flaws like buffer overflow exploits.
A light runtime with no garbage collection and overhead means Rust is ideal to embed in other languages and platforms like Ruby, Python, and Node.
Rust has many modern language features unavailable in other systems languages.

Rust is a serious language, capable of very serious projects. The current flagship Rust project, for instance, is Servo, a browser engine under open source development with contributions from Mozilla and Samsung.

The best introduction to Rust is the Rust Book. Newcomers should also read Steve Klabnik’s alternative introdution to Rust for the upfront no-nonsense dive into memory ownership, the crux concept for Rust beginners.

Those in a hurry can quickstart with these slide decks by:

Two must-read learning resources are 24 Days of Rust, a charming tour around the libraries and world of Rust, and ArcadeRS, a tutorial in Rust about writing video games.

And finally, if Servo has you interested in writing a browser engine in Rust, then Let’s build a browser engine! is the series for you. It walks through creating a simple HTML rendering engine in Rust.

Moral Support for Learning the Memory Rules

The Road to Rust is not royal, there is no pretending otherwise. The Rust memory rules about lifetime, ownership, and borrowing are especially hard to learn.

It probably doesn’t much feel like it but Rust is really trying to help us with these rules. And to be fair to Rust, it hasn’t segfaulted me so far.

But that is no comfort when the compiler won’t build your code and you can’t figure out why. The best advice is probably to read as much about the Rust memory rules as you can and to keep reading about them over and over until they start to make some sense. Don’t worry, everybody finds it difficult at first.

Although adherence to the rules provides the compiler with invariant guarantees that can be used to construct proofs of memory safety, the rationale for these rules is largely unimportant. What is necessary is to find a way to work with them so your programs compile.

Remember too that learning to manage memory safely in C/C++ is much harder than learning Rust and there is no compiler checking up on you in C/C++ to make sure your memory management is correct.

Keep at it. It takes a long time but it does become clearer!

Niche Observations

This section is a scattering of Rust arcana that caught my attention. Nothing here that doesn’t interest you is worth troubling too much with and you should skip on past.

Travis CI has excellent support for building Rust projects, including with the beta and nightly versions. It is simple to set up by configuring a travis.yml according to the Travis Rust documentation. See the Travis CI build for this project for an example.

Rust has a formatter in rustfmt and a lint in rust-clippy. The formatter is a simple install using cargoinstall and provides a binary command. The lint requires more integration into your project, and currently also needs the nightly version of Rust for plugin support. Both projects are great for helping Rust newcomers.

Foreign Function Interface is an area where Rust excels. The absence of a large runtime means Rust is great for embedding in other languages and it has a wide range as a C replacement in writing modules for Python, Ruby, Node, etc. The Rust Book introduction demonstrates how easy it is call Rust from other languages. Day 23 of Rust and the Rust FFI Omnibus are additional resources for Rust FFI.

Rust is being used experimentally for embedded development. Zinc is work on building a realtime ARM operating system using Rust primarily, and the following are posts about building software for embedded devices directly using Rust.

Relatedly, Rust on Raspberry Pi is a guide to cross-compiling Rust code for the Raspberry Pi.

Rust treats the code snippets in your project documentation as tests and makes a point of compiling them. This helps keep documentation in sync with code but it is a shock the first time you get a compiler error for a documentation code snippet and it takes you ages to realise what is happening.

Kolmogorov-Smirnov Library

The Kolmogorov-Smirnov test implementation is available as a Cargo crate, so it is simple to incorporate into your programs. Add the dependency to your Cargo.toml file.

[dependencies]kolmogorov_smirnov="1.0.1"

Then to use the test, call the kolmogorov_smirnov::test function with the two samples to compare and the desired confidence level.

externcratekolmogorov_smirnovasks;letxs=vec!(0,1,2,3,4,5,6,7,8,9,10,11,12);letys=vec!(12,11,10,9,8,7,6,5,4,3,2,1,0);letconfidence=0.95;letresult=ks::test(&xs,&ys,confidence);if!result.is_rejected{// Woot! Samples are from the same distribution with 0.95 confidence.}

The Kolmogorov-Smirnov test as implemented works for any data with a Clone and an Ord trait implementation in Rust. So it is possible, but pretty useless, to test samples of characters, strings and lists. In truth, the Kolmorogov-Smirnov test requires the samples to be taken from a continuous distribution, so discrete data like characters and strings are cute to consider but invalid test data.

Still being strict, this test condition also does not hold for integer data unless some hands are waved about the integer data being embedded into real numbers and a distribution cooked up from the probability weights. We make some compromises and allowances.

If you have floating point or integer data to test, you can use the included test runner binaries, ks_f64 and ks_i32. These operate on single-column headerless data files and test two commandline argument filenames against each other at 95% confidence.

$ cargo run -q --bin ks_f64 dat/normal_0_1.tsv dat/normal_0_1.1.tsv
Samples are from the same distribution.
test statistic = 0.0399169921875
critical value = 0.08550809323787689
reject probability = 0.18365715210599798

$ cargo run -q --bin ks_f64 dat/normal_0_1.tsv dat/normal_1_1.1.tsv
Samples are from different distribution.
test statistic = 0.361572265625
critical value = 0.08550809323787689
reject probability = 1

Testing floating point numbers is a headache because Rust floating point types (correctly) do not implement the Ord trait, only the PartialOrd trait. This is because things like NaN are not comparable and the order cannot be total over all values in the datatype.

The test runner for floating point types is implemented using a wrapper type that implements a total order, crashing on unorderable elements. This suffices in practice since the unorderable elements will break the test anyway.

The implementation uses the Numerical Recipes approximation for rejection probabilities rather than the almost as accurate SOEST table approximation for critical values. This allows the additional reporting of the reject probability which isn’t available using the SOEST approach.

Datasets

Statistical tests are more fun if you have datasets to run them over.

N(0,1)

Because it is traditional and because it is easy and flexible, start with some normally distributed data.

Rust can generate normal data using the rand::distributions module. If mean and variance are f64 values representing the mean and variance of the desired normal deviate, then the following code generates the deviate. Note that the Normal::new call requires the mean and standard deviation as parameters, so it is necessary to take the square root of the variance to provide the standard deviation value.

externcraterand;userand::distributions::{Normal,IndependentSample};letmean:f64=...letvariance:f64=...letmutrng=rand::thread_rng();letnormal=Normal::new(mean,variance.sqrt());letx=normal.ind_sample(&mutrng);

The kolmogorov_smirnov library includes a binary for generating sequences of independently distributed Normal deviates. It has the following usage.

cargo run --bin normal <num_deviates> <mean> <variance>

The -q option is useful too for suppressing cargo build messages in the output.

Sequences from N(0, 1) , N(0, 2) , and N(1, 1) are included in the Github repository. N(0, 2) is included mainly just to troll, calculating $\sqrt{2}$ and drawing attention to the limitations of the floating point represention of irrational numbers.

cargo run -q --bin normal 819201> normal_0_1.tsv
cargo run -q --bin normal 819202> normal_0_2.tsv
cargo run -q --bin normal 819211> normal_1_1.tsv

These are not the most beautiful of Normal curves, but you must take what you get. The N(0, 1) data is lumpy and not single peaked.

N(0, 2) is similar, though less of a disaster near the mean.

N(1, 1) by contrast looks suprisingly like the normal data diagrams in textbooks.

The following is a plot of all three datasets to illustrate the relative widths, heights and supports.

Results

The Kolmogorov-Smirnov test is successful at establishing the N(0, 1) datasets are all from the same distribution in all combinations of the test.

$ cargo run -q --bin ks_f64 dat/normal_0_1.tsv dat/normal_0_1.1.tsv
Samples are from the same distribution.
test statistic = 0.0399169921875
critical value = 0.08550809323787689
reject probability = 0.18365715210599798

$ cargo run -q --bin ks_f64 dat/normal_0_1.tsv dat/normal_0_1.2.tsv
Samples are from the same distribution.
test statistic = 0.0595703125
critical value = 0.08550809323787689
reject probability = 0.6677483327196572

...

Save yourself the trouble in reproduction by running this instead:

for I in dat/normal_0_1.*
dofor J in dat/normal_0_1.*
    doif[["$I"< "$J"]]thenecho$I$J
            cargo run -q --bin ks_f64 $I$Jecho            echofidonedone

The N(1, 1) datasets also correctly accept the null hypothesis in all combinations of dataset inputs when tested against each other.

However, N(0, 2) successfully passes for all combinations but that between dat/normal_0_2.tsv and dat/normal_0_2.1.tsv where it fails as a false negative.

$ cargo run -q --bin ks_f64 dat/normal_0_2.1.tsv dat/normal_0_2.tsv
Samples are from different distributions.
test statistic = 0.102783203125
critical value = 0.08550809323787689
reject probability = 0.9903113063475989

This failure is a demonstration of how the Kolmogorov-Smirnov test is sensitive to location because here the mean of the dat/normal_0_2.1.tsv is shifted quite far from the origin.

This is the density.

And superimposed with the density from dat/normal_0_2.tsv.

The data for dat/normal_0_2.1.tsv is the taller density in this graph. Notice, in particular, that the mean is shifted left a lot in comparison with dat/normal_0_2.tsv. See also the chunks of non-overlapping weight on the left and right hand slopes.

The difference in means is confirmed by calculation. The dataset for dat/normal_0_2.tsv has mean 0.001973723, whereas the dataset for dat/normal_0_2.1.tsv has mean -0.2145779. By comparison, the other N(0, 2) datasets tests have means -0.1308625, -0.08537648, and -0.01374325.

Looking at the empirical cumulative density functions of the false negative comparison, we see a significant gap between the curves starting near 0.

For comparison, here are the overlaid empirical cumulative density functions for the other N(0, 2) tests.

One false negative in thirty unique test pairs at 95% confidence is on the successful side of expectations.

Turning instead to tests that should be expected to fail, the following block runs comparisons between datasets from different distributions.

for I in dat/normal_0_1.*
dofor J in dat/normal_0_2.*
    doecho$I$J
        cargo run -q --bin ks_f64 $I$Jecho        echodonedone

The N(0, 1) against N(1, 1) and N(0, 2) against N(1, 1) tests correctly reject the null hypothesis in every variation. These tests are easy failures because they are large location changes, illustrating again how the Kolmogorov-Smirnov test is sensitive to changes in centrally located weight.

However, there are ten false positives in the comparisons between datasets from N(0, 1) and N(0, 2) .

dat/normal_0_1.2.tsv is reported incorrectly as being from the same distribution as the following datasets.

dat/normal_0_2.tsv at $P(\text{reject } H_0) = 0.9375$ ,
dat/normal_0_2.2.tsv at $P(\text{reject } H_0) = 0.6584$ ,
dat/normal_0_2.3.tsv at $P(\text{reject } H_0) = 0.9128$ ,
dat/normal_0_2.4.tsv at $P(\text{reject } H_0) = 0.8658$ .

Similarly, dat/normal_0_1.3.tsv is a false positive against:

dat/normal_0_2.3.tsv at $P(\text{reject } H_0) = 0.9128$ ,
dat/normal_0_2.4.tsv at $P(\text{reject } H_0) = 0.8658$ .

And dat/normal_0_1.4.tsv is a false positive against:

dat/normal_0_2.1.tsv at $P(\text{reject } H_0) = 0.9451$ ,
dat/normal_0_2.2.tsv at $P(\text{reject } H_0) = 0.9451$ ,
dat/normal_0_2.3.tsv at $P(\text{reject } H_0) = 0.9128$ ,
dat/normal_0_2.4.tsv at $P(\text{reject } H_0) = 0.9128$ .

Note that many of these false positives have rejection probabilities that are high but fall short of the 95% confidence level required. The null hypothesis is that the distributions are the same and it is this that must be challenged at the 95% level.

Let’s examine the test where the rejection probability is lowest, that between dat/normal_0_1.2.tsv and dat/normal_0_2.2.tsv.

$ cargo run -q --bin ks_f64 dat/normal_0_1.2.tsv dat/normal_0_2.2.tsv
Samples are from the same distribution.
test statistic = 0.08203125
critical value = 0.11867932230234146
reject probability = 0.6584658436106378

The overlaid density and empirical cumulative density functions show strong difference.

The problem, however, is a lack of samples combined with the weakness of the Kolmogorov-Smirnov test in detecting differences in spread at the tails. Both of these datasets have 256 samples and the critical value for 95% confidence is 0.1186. This is a large difference to demonstrate at the edges of the empirical cumulative distribution functions and in the case of this test the test statistic is a comfortable 0.082.

There is insufficient evidence to reject the null hypothesis.

Let’s also examine the false positive test where the rejection probability is tied highest, between dat/normal_0_1.4.tsv and dat/normal_0_2.1.tsv.

$ cargo run -q --bin ks_f64 dat/normal_0_1.4.tsv dat/normal_0_2.1.tsv
Samples are from the same distribution.
test statistic = 0.1171875
critical value = 0.11867932230234146
reject probability = 0.9451734528250557

This is just incredibly borderline. There is very strong difference on the left side but it falls fractionally short of the required confidence level. Note how this also illustrates the bias in favour of the null hypothesis that the two samples are from the same distribution.

Notice that of the false positives, only the one between dat/normal_0_1.2.tsv and dat/normal_0_2.tsv happens with a dataset containing more than 256 samples. In this test with 8192 samples against 256, the critical value is 0.0855 and the test statistic scrapes by underneath at 0.08288.

In the case for two samples of size 8192, the critical value is a very discriminating 0.02118.

In total there are ten false positives in 75 tests, a poor showing.

The lesson is that false positives are more common and especially with small datasets. When using the Kolmogorov-Smirnov test in production systems, tend to use higher confidence levels when larger datasets cannot be available.

A Diversion In QuickCheck

QuickCheck is crazy amounts of fun writing tests and a great way to become comfortable in a new language.

The idea in QuickCheck is to write tests as properties of inputs rather than specific test cases. So, for instance, rather than checking whether a given pair of samples have a determined maximum empirical cumulative distribution function distance, instead a generic property is verified. This property can be as simple as the distance is between zero and one for any pair of input samples or as constrictive as the programmer is able to create.

This form of test construction means QuickCheck can probabilistically check the property over a huge number of test case instances and establish a much greater confidence of correctness than a single individual test instance could.

It can be harder too, yes. Writing properties that tightly specify the desired behaviour is difficult but starting with properties that very loosely constrain the software behaviour is often helpful, facilitating an evolution into more sharply binding criteria.

For a tutorial introduction to QuickCheck, John Hughes has a great introduction talk.

There is an implementation of QuickCheck for Rust and the tests for the Kolmogorov-Smirnov Rust library have been implemented using it. See the Github repository for examples of how to QuickCheck in Rust.

Here is a toy example of a QuickCheck property to test an integer doubling function.

externcratequickcheck;useself::quickcheck::quickcheck;fndouble(n:u32)->u32{2*n}#[test]fntest_double_n_is_greater_than_n(){fnprop(n:u32)->bool{double(n)>n}quickcheck(propasfn(u32)->bool);}

This test is broken and QuickCheck makes short(I almost wrote ‘quick’!) work of letting us know that we have been silly.

test tests::test_double_n_is_greater_than_n ... FAILED

failures:

---- tests::test_double_n_is_greater_than_n stdout ----
    thread 'tests::test_double_n_is_greater_than_n' panicked at '[quickcheck] TEST FAILED. Arguments: (0)', /root/.cargo/registry/src/github.com-0a35038f75765ae4/quickcheck-0.2.24/src/tester.rs:113

The last log line includes the u32 value that failed the test, i.e. zero. Correct practice is to now create a non-probabilistic test case that tests this specific value. This protects the codebase from regressions in the future.

The problem in the example is that the property is not actually valid for the double function because double zero is not actually greater than zero. So let’s fix the test.

#[test]fntest_double_n_is_geq_n(){fnprop(n:u32)->bool{double(n)>=n}quickcheck(propasfn(u32)->bool);}

Note also how QuickCheck produced a minimal test violation, there are no smaller values of u32 that violated the test. This is not an accident, QuickCheck libraries often include features for shrinking test failures to minimal examples. When a test fails, QuickCheck will often rerun it searching successively on smaller instances of the test arguments to determine the smallest violating test case.

The function is still broken, by the way, because it overflows for large input values. The Rust QuickCheck doesn’t catch this problem because the QuickCheck::quickcheck convenience runner configures the tester to produce random data between zero and one hundred, not in the range where the overflow will be evident. For this reason, you should not use the convenience runner in testing. Instead, configure the QuickCheck manually with as large a random range as you can.

externcraterand;useself::quickcheck::{QuickCheck,StdGen,Testable};usestd::usize;fncheck<A:Testable>(f:A){letgen=StdGen::new(rand::thread_rng(),usize::MAX);QuickCheck::new().gen(gen).quickcheck(f);}#[test]fntest_double_n_is_geq_n(){fnprop(n:u32)->bool{double(n)>=n}check(propasfn(u32)->bool);}

This will break the test with an overflow panic. This is correct and the double function should be reimplemented to do something about handling overflow properly.

A warning, though, if you are testing vec or string types. The number of elements in the randomly generated vec or equivalently, the length of the generated string will be between zero and the size in the StdGen configured. There is the potential in this to create unnecessarily huge vec and string values. See the example of NonEmptyVec below for a technique to limit the size of a randomly generated vec or string while still using StdGen with a large range.

Unfortunately, you are out of luck on a 32-bit machine where the usize::MAX will only get you to sampling correctly in u32. You will need to upgrade to a new machine before you can test u64, sorry.

By way of example, it is actually more convenient to include known failure cases like u32::max_value() in the QuickCheck test function rather than in a separate traditional test case function. So, when the QuickCheck fails for the overflow bug, add the test case like follows instead of as a new function:

#[test]fntest_double_n_is_geq_n(){fnprop(n:u32)->bool{double(n)>=n}assert!(prop(u32::max_value()));quickcheck(propasfn(u32)->bool);}

Sometimes the property to test is not valid on some test arguments, i.e. the property is useful to verify but there are certain combinations of probabilistically generated inputs that should be excluded.

The Rust QuickCheck library supports this with TestResult. Suppose that instead of writing the double test property correctly, we wanted to just exclude the failing cases instead. This might be a practical thing to do in a real scenario and we can rewrite the test as follows:

useself::quickcheck::TestResult;#[test]fntest_double_n_is_greater_than_n_if_n_is_greater_than_1(){fnprop(n:u32)->TestResult{ifn<=1{returnTestResult::discard()}letactual=double(n);TestResult::from_bool(actual>n)}quickcheck(propasfn(u32)->TestResult);}

Here, the cases where the property legimately doesn’t hold are excluded by returning``TestResult::discard()``. This causes QuickCheck to retry the test with the next randomly generated value instead.

Note also that the function return type is now TestResult and that TestResult::from_bool is needed for the test condition.

An alternative approach is to create a wrapper type in the test code which only permits valid input and to rewrite the tests to take this type as the probabilistically generated input instead.

For example, suppose you want to ensure that QuickCheck only generates positive integers for use in your property verification. You add a wrapper type PositiveInteger and now in order for QuickCheck to work, you have to implement the Arbitrary trait for this new type.

The minimum requirement for an Arbitrary implementation is a function called arbitrary taking a Gen random generator and producing a random PositiveInteger. New implementations should always leverage existing Arbitrary implementations, and so PositiveInteger generates a random u64 using u64::arbitrary() and constrains it to be greater than zero.

externcratequickcheck;useself::quickcheck::{Arbitrary,Gen};usestd::cmp;#[derive(Clone, Debug)]structPositiveInteger{val:u64,}implArbitraryforPositiveInteger{fnarbitrary<G:Gen>(g:&mutG)->PositiveInteger{letval=cmp::max(u64::arbitrary(g),1);PositiveInteger{value:val}}fnshrink(&self)->Box<Iterator<Item=PositiveInteger>>{letshrunk:Box<Iterator<Item=u64>>=self.value.shrink();Box::new(shrunk.filter(|&v|v>0).map(|v|{PositiveInteger{value:v}}))}}

Note also the implementation of shrink() here, again in terms of an existing u64::shrink(). This method is optional and unless implemented QuickCheck will not minimise property violations for the new wrapper type.

Use PositiveInteger like follows:

useself::quickcheck::quickcheck;fnsquare(n:u64)->u64{n*n}#[test]fntest_square_n_for_positive_n_is_geq_1(){fnprop(n:PositiveInteger)->bool{square(n.value)>=1}quickcheck(propasfn(PositiveInteger)->bool);}

There is no need now for TestResult::discard() to ignore the failure case for zero.

Finally, wrappers can be added for more complicated types too. A commonly useful container type generator is NonEmptyVec which produces a random vec of the parameterised type but excludes the empty vec case. The generic type must itself implement Arbitrary for this to work.

externcratequickcheck;externcraterand;useself::quickcheck::{quickcheck,Arbitrary,Gen};useself::rand::Rng;usestd::cmp;#[derive(Debug, Clone)]structNonEmptyVec<A>{value:Vec<A>,}impl<A:Arbitrary>ArbitraryforNonEmptyVec<A>{fnarbitrary<G:Gen>(g:&mutG)->NonEmptyVec<A>{// Limit size of generated vec to 1024letmax=cmp::min(g.size(),1024);letsize=g.gen_range(1,max);letvec=(0..size).map(|_|A::arbitrary(g)).collect();NonEmptyVec{value:vec}}fnshrink(&self)->Box<Iterator<Item=NonEmptyVec<A>>>{letvec:Vec<A>=self.value.clone();letshrunk:Box<Iterator<Item=Vec<A>>>=vec.shrink();Box::new(shrunk.filter(|v|v.len()>0).map(|v|{NonEmptyVec{value:v}}))}}#[test]fntest_head_of_sorted_vec_is_smallest(){fnprop(vec:NonEmptyVec<u64>)->bool{letmutsorted=vec.value.clone();sorted.sort();// NonEmptyVec must have an element.lethead=sorted[0];vec.value.iter().all(|&n|head<=n)}quickcheck(propasfn(NonEmptyVec<u64>)->bool);}

↧

How three teenagers invented an app to police the cops

December 28, 2015, 1:15 pm

≫ Next: MIT Mathlets: Interactive mathematics visualizations

≪ Previous: The Kolmogorov-Smirnov Test

THE Christian siblings were doing their homework when the police arrived. Two officers entered the house, guns drawn, pursuing what was evidently a prank tip-off about a captive being held at their address. The guns stayed out even when the mistake became apparent; they ran the details of the children’s father—who, like them, is black—through the police system on the off chance of turning something up.

The family was traumatised. The incident (in 2013) brought home to Ima Christian, now 18, that Americans could be vulnerable to rough policing “no matter where you live, or who you are”; her sister Asha, who is 16, says it is “not until you are face to face with an officer that you realise what the deal is.” The sisters—from Stone Mountain, just outside Atlanta—didn’t get even, exactly. Instead, with their brother Caleb (now 15), they developed an app, called Five-O, intended to help improve police behaviour and community relations. It lets citizens rate their experiences with officers, record both parties’ race and sex and the purpose of the interaction, and find aggregate scores for county forces.

Five-O (a slang term for cops) launched in 2014, but will get a boost this spring from the €20,000 ($22,000) prize it won at an international contest for justice-related initiatives, organised by a think-tank in the Netherlands. The money will go towards marketing the tool in Baltimore and Chicago: attracting input from broad cross-sections of such communities is one of the ways the Christians believe they can neutralise an obvious potential bias—ie, that the ratings will be skewed by the aggrieved (legitimate as those grievances may sometimes be). That composite picture, combining good and bad feedback, is, they reckon, one of the ways their product differs from other police-related apps, which concentrate on uploading video. They also want to extend their coverage from Android to iPhones. The long-term plan is to include Britain, Brazil, Canada and Russia, making Five-O, as Asha puts it, “a global repository of unbiased police data”.

That is an ambitious goal for teenagers who mostly taught themselves to code (their parents used to work for an internet start-up and, Caleb recalls, noticed youngsters “getting paid insane amounts of money” for programming). In 2016 they aim to launch another app through their firm, Pinetart Inc: this one, Coily, lets women rate hair-care products, and so avoid shower-stall accumulations of half-empty bottles. Studies permitting, that is: Ima is a freshman at Stanford University; Asha—who is finishing high school online, to free up time for enterprise—hopes to join her or go to Columbia. “I’m very proud of them,” says their mother Karen.

↧

MIT Mathlets: Interactive mathematics visualizations

December 28, 2015, 1:28 pm

≫ Next: Stupid Patent of the Month: Microsoft’s Design Patent on a Slider

≪ Previous: How three teenagers invented an app to police the cops

Copyright © 2009--2015 H. Miller | Powered by WordPress
Except where otherwise noted, content on this site is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported license

↧

Stupid Patent of the Month: Microsoft’s Design Patent on a Slider

December 28, 2015, 10:09 am

≫ Next: The future of the Crystal language

≪ Previous: MIT Mathlets: Interactive mathematics visualizations

For the first time ever, this month’s Stupid Patent of the Month is being awarded to a design patent. Microsoft recently sued Corel for, among other things, infringing its patent on a slider, D554,140, claiming that Corel Home Office has infringed Microsoft’s design.

The design patent, as detailed by Microsoft in its complaint, is titled “User Interface for a Portion of a Display Screen” and entitles Microsoft to own this:

More specifically, Microsoft claims to own this design of a slider.

Design patents aren’t like the utility patents that most people think of when they think of patents. Unlike utility patents, which are meant for new and useful inventions, design patents are meant for new, non-functional, ornamental aspects of articles. They have only one claim, little to no written description, and usually a series of images detailing what exactly is being claimed. (A note about design patents: solid lines are used to show what is claimed; broken or dotted lines show the unclaimed “environment related to the design” or define the boundary of the design.)

As Professor Sarah Burstein points out on her fantastic Tumblr, design patents are often issued on a small part of a product, and often for things that seem unoriginal, not ornamental, or just ridiculous.

Microsoft’s patent claims against Corel are unsurprising in light of how much money is potentially at stake. If Corel is found to infringe even one of Microsoft’s design patents through even the smallest part of Corel Home Office, current Federal Circuit law entitles Microsoft to all of Corel’s profits for the entire product. Not the profits that can be attributed to the design. Not the value that the design adds to a product. All of the profit from Corel Home Office.

The well-known Apple v. Samsung dispute addressed the issue of whether an infringer should be required to pay all of its profits for infringing a design patent that applies only to a portion of a product. Samsung had asked the Court of Appeals for the Federal Circuit to reject this reading, but the court disagreed in a May, 2015 opinion.

Samsung has now asked the Supreme Court to weigh in. In its petition for certiorari, Samsung points out the absurd results of this rule. For example, Samsung explains that under the Federal Circuit’s ruling, “profits on an entire car—or even an eighteen-wheel tractor trailer—must be awarded based on an undetachable infringing cup-holder.” In addition, given that many products will include multiple ornamental features that could be covered by design patents, this raises the possibility that a company could get hit for multiple judgments for all its profits.

That sounds pretty crazy to us. But that’s exactly what might happen if Microsoft prevails against Corel. Putting aside whether Microsoft’s design was actually new and not obvious in 2006 (when Microsoft filed its application), whether Microsoft needed the patent incentive in order to come up with this design, and whether it is even desirable to grant a company a government-backed monopoly on a graphical slider (we don't think so, that's why this is a stupid patent), the scope of damages for design patent infringement has the potential to become a powerful tool to shut down legitimate competition based on the mere threat of a lawsuit.

↧

The future of the Crystal language

December 28, 2015, 1:59 pm

≫ Next: The Ultimate Amiga 500 Talk [video]

≪ Previous: Stupid Patent of the Month: Microsoft’s Design Patent on a Slider

The Programming Language

24 Dec 2015 by asterite

(This post is part of Crystal Advent Calendar 2015)

This Christmas eve something curious happened: we were happily coding in Crystal when, one moment when we took our eyes away from the screen, a translucid figure appeared nearby. The entity approached and said: “I’m the Ghost of Christmas Past. Come with me.”

We saw ourselves coding a new language that would resemble Ruby but be compiled and type safe. At that moment the language was really like Ruby: to create an empty Array you would write [], or Set.new to create an empty set. And we were happy, until we realized compile times were huge, exponential, unbearable, and got sad.

We spent an awful time trying to make it work with no avail. Finally, we decided to make a change: speficy the types of empty generic types, for example [] of Int32 or Set(Int32).new. Compile times were back to normal. And we were kind of happy again, but at the same time felt that we were leaving behind some of Ruby’s feeling. The language diverged.

We looked back at the Ghost of Christmas Past to ask him what did all of that mean, but we found a similar but different figure in its place. She said: “I’m the Ghost of Christmas Present. Join me.”

Around us, a small but vibrant community was programming in Crystal. They were happy. There was no mention of the annoyance of having to specify types for generic types. Everyone was feeling that Ruby’s spirit was somehow still present: in the familiar API and classes, in the syntax, in the powerful blocks. Additionally, the increased performance, both in terms of CPU and concurrency, coupled with better type safety, really paid off, so having to specify a type now and then didn’t feel like bothersome.

Again, we looked at the Ghost to ask for a meaning: it seemed that we took a good decision in the past, right? But, just as before, there was something else in its place, a mechanical crystalline figure. It spoke: “I’m the Ghost of Christmas Yet to Come. Follow me.”

The small community was still programming in Crystal, though most didn’t seem to be as happy as before. We tried to ask them why, but nobody noticed our presence. We tried to use a computer and search for Crystal on the internet, but our hands couldn’t touch anything. We turned to the Ghost with an inquisitive face, and noticed it had a keyboard and a small screen in its chest. We searched “Crystal sucks”, which would hopefully show posts of complaints about the language. And indeed there were quite a few of them. Most were about huge compile times and memory usage. “Huge compile times?”, we thought. “We solved that years ago!”, we shouted to the Ghost. The only reply we got from it was “compiling…”, the vision faded and we were back at the office, alone.

Back to the present

“Let’s do some math”, we said. The biggest program we have in Crystal right now is the compiler, which has about 40K lines of code. It takes about 10 seconds to compile, and it takes 940MB of memory to do so. One of our Rails apps, counting the total number of lines in its gems and the code in the “app” directory, has about 320K line of code, 8 times bigger than the compiler. If we rewrite it in Crystal, or at least do an application with a similar functionality, it would take 80 seconds to compile it, each time, and 8GB of memory to do so. That’s a lot of time to wait after each change, and an awful lot of memory too.

Can we improve this situation, with the current language? Can we introduce incremental compilation? We spent some time thinking about how to cache a previous compilation’s semantic result (inferred types) and use that for the next compilation. An observation is that a method’s type depends exclusively on the type of the arguments, the type of the methods it invokes, and the types of instance, class, and global variables.

So, one idea is to cache the inferred instance variables types of all the types in a program, together with the types of method instantiations and its dependencies (on which types that method depends, and specifically which other methods it calls). If instance variables types remain the same, a method’s code didn’t change, and the dependencies (invoked methods) didn’t change, we can safely reuse the result (types and generated code) from the previous compilation.

Note that the above “if” starts with “if instance variables types remain the same”. But how can we know that? The problem is that the compiler determines their type by traversing the program, instantiating methods, and checking what gets assigned to them. So we can’t really reuse the cache because we can’t know the final types until we type the whole program! It’s a chicken and egg problem.

The solution seems to be having to specify types of instance, class and global variables. With this, once we type a method its type can never change (because everything that’s non-local to a method, like instance variables, can’t change anymore). We would be able to cache that information and reuse it for next compilations. Not only that, but type inference becomes much simpler and faster, even without a cache.

Is this the right thing to do? We will once again diverge a bit more from Ruby. What future do we want? Do we want to stick with the current approach at the cost of having to wait a lot of time between each compilation? Or is it better to specify some more types but have a more agile development cycle?

What we really want is a language that’s fun to use, and efficient. Having to wait a lot of time for compilation to finish isn’t fun at all, even less fun that having to annotate a few types now and then. And these types are just for generic, instance, class and global variables: no types annotations are required in local variables and method arguments. Considering how rarely these types change, compared to how many times you are going to be writing new methods and compiling your program, it feels it’s something worth of a change.

We already started working on this new compiler, because we want to do this as soon as possible as a lot of code out there will break. While the current compiler works directly on the AST, in the new compiler we work with a flow graph, which will allow us to have a simpler compiler (one which anyone could understand and jump right into it and contribute) and easier to understand and optimize code. It will also make it possible to introduce new features like Ruby’s retry with minimal effort, because the flow graph allows for cycles and “goto”-like jumps.

If you’d like to know more about this change, there’s a tracking issue about it.

Questions and Answers

When will you finish the new compiler? We don’t know yet. We are working on it slowly but steadily, writing it with readability, extensibility and efficiency in mind, and focusing on the hardest parts first. Now that we know most of the features the language supports, it’s easier. Remember that the current compiler started as an experiment, and as a port of a compiler written in Ruby, so its code is not the best Crystal code out there.
Will you continue working on the current compiler? Yes and no. We will fix bugs if they are easy to fix, and we will continue extending and improving the standard library.
Will all my code stop compiling? Probably. However, you can use the current compiler’s tool hierarchy to ask it the types of instance variables to make the upgrade easier. In fact we might probably include a tool to do the upgrade automatically, it’s really that simple.
Will the new compiler include other features? We hope so! With this change we also plan to support forwarding blocks with the usual &block syntax. Right now this is possible but it always ends up creating a closure, but this can be done much better. We also plan to allow recursive calls with blocks, something that you can do in Ruby but not in Crystal. We also want to be able to have Array(Object) or Array(T) with any kind of T, something that, again, is not quite possible with the current version of the language. So these new type annotations will bring a lot more power to the language as a compensation.
Will there be more breaking changes like this in the future? We are pretty sure the answer is no. If we know the types of instance, class, and global variables then given a method, the type of self and the type of its arguments we can infer its type by just analyzing that method and the methods it calls. Right now this is not possible because the type of some methods depends on how you use a class (what you assign to it). So this change will be the last big breaking change.

Please enable JavaScript to view the comments powered by Disqus.

comments powered by Published with

The Ultimate Amiga 500 Talk [video]

December 28, 2015, 9:29 am

≫ Next: os.js: JavaScript Cloud/Web Desktop Platform

≪ Previous: The future of the Crystal language

C3TV - The Ultimate Amiga 500 Talk

rahra

About

The Amiga was one of the most powerful and wide srpead computers in the late 80's. This talk explains its hardware design and programming.

os.js: JavaScript Cloud/Web Desktop Platform

December 29, 2015, 4:38 pm

≫ Next: Introduction to the Math of Computer Graphics

≪ Previous: The Ultimate Amiga 500 Talk [video]

OS.js is a JavaScript web desktop implementation for your browser with a fully-fledged window manager, Application APIs, GUI toolkits and filesystem abstraction.

Please note that OS.js is under development and features are not complete

Open-source

OS.js is completely free and open-source which means you can contribute to the development or use the code as you like. View License

Works on any modern browser and can be deployed on all platforms.

Web Desktop

Features a fully customizable Web Desktop and Window Manager inspired by Linux desktop environments, built to be blazingly fast and feel like the real thing.

With the Virtual Filesystem you can upload, download and modify your files across several cloud storage soultions, including Google Drive, Dropbox and OneDrive

Comes translated in the following languages thanks to the community: Norwegian, French, German, Russian, Dutch, Polish, Vietnamese and Chinese.

Applications

Includes a default suite of Applications: File Manager, Music player, Video player, Picture viewer and editor, Calculator and text editors.

You can easiliy add more applications and features using the official repositories or community contributions.

Some of the extra applications include: PDF viewer, XMPP Chat, Google Mail, Google Contacts, Tetris and Wolfenstein3D.

Extendable

OS.js features simple, modilarized and flexible JavaScript APIs so you can easily make changes, extend functionality and create applications.

You can also make your own style-, audio- and icon themes with ease.

Comes with a build-system for easy deployment and configuration.

Documentation

All the documentation, manuals and tutorials are gathered right here.

Installation

Just run the automated installer and you will be ready to go.

Make sure to read the installation documentation before you begin. There you will also find instructions on how to download and install manually.

curl -sS http://os.js.org/installer | sh

Windows users can use this installer: http://os.js.org/installer.exe

Full list of features

Free and Open-source
Very simple installation
Works in any modern browser
Server is deployable on any platform
Can be built to run entirely in-browser without any server
Dependency-free JavaScript frontend
Easy to use APIs
Customizable and easy to extend with custom code and modules etc.
Supports multi-user environments and authentication
Virtual File System - Store your files across many different storage/cloud providers
Desktop and Window Manager built to feel familiar to most users
Drag-and-drop between applications
Supports sessions so you can reload workspaces on any computer
Localization and translations
Comes with a small application suite
Supports adding of packages via external repositories
Comes with all the tools nesessry to build your own applications
Client is written in Strict Mode JavaScript and uses ECMAScript 5.1 standards
Follows industry standard style guides
Comes with Google API Javascript Support
Comes with Windows Live API Javascript Support
Google Drive support
Dropbox support
OneDrive support
Run native GTK+ 3.x Applications via Broadway (very experimental)
Can be deployed using X11 to work as a full-fledged desktop solution

Contact & Community

Official social-media pages:

Google+
Facebook
Reddit

Contact author via:

Twitter @andersevenrud
Google+
Homepage
Email

Show your support:

↧

Introduction to the Math of Computer Graphics

December 29, 2015, 12:18 pm

≫ Next: No Brainer

≪ Previous: os.js: JavaScript Cloud/Web Desktop Platform

Three-dimensional computer graphics is an exciting aspect of computing because of the amazing visual effects that can be created for display. All of this is created from an enormous number of calculations that manipulate virtual models, which are constructed from some form of geometric definition. While the math involved for some aspects of computer graphics and animation can become quite complex, the fundamental mathematics that is required is very accessible. Beyond learning some new mathematic notation, a basic three-dimensional view can be constructed with algorithms that only require basic arithmetic and a little bit of trigonometry.

I demonstrate how to perform some basic operations with two key constructs for 3D graphics, without the rigorous mathematic introduction. Hopefully the level of detail that I use is enough for anyone that doesn't have a strong background in math, but would very much like to play with 3D APIs. I introduce the math that you must be familiar with in this entry, and in two future posts I demonstrate how to manipulate geometric models and apply the math towards the display of a 3D image.

Linear Algebra

The type of math that forms the basis of 3D graphics is called linear algebra. In general, linear algebra is quite useful for solving systems of linear equations. Rather than go into the details of linear algebra, and all of its capabilities, we are going to simply focus on two related constructs that are used heavily within the topic, the matrix and the vertex.

Matrix

(Add matrix introduction)

These are all examples of valid matrices:

$ \left\lbrack \matrix{a & b & c \cr d & e & f} \right\rbrack, \left\lbrack \matrix{10 & -27 \cr 0 & 13 \cr 7 & 17} \right\rbrack, \left\lbrack \matrix{x^2 & 2x \cr 0 & e^x} \right\rbrack$

Square matrices are particularly important, that is a matrix with the same number of columns and rows. There are some operations that can only be performed on a square matrix, which I will introduce shortly. The notation for matrices, in general, uses capital letters as the variable names. Matrix dimensions can be specified as a shorthand notation, and to also identify an indexed position within the matrix. As far as I am aware, row-major indexing is always used for consistency, that is, the first index represents the row, and the second represents the column.

$ A= [a_{ij}]= \left\lbrack \matrix{a_{11} & a_{12} & \ldots & a_{1n} \cr a_{21} & a_{22} & \ldots & a_{2n} \cr \vdots & \vdots & \ddots & \vdots \cr a_{m1} & a_{m2} & \ldots & a_{mn} } \right\rbrack $

Vector

The vector is a special case of a matrix, where there is only a single row or a single column; also a common practice is to use lowercase letters to represent vectors:

$ u= \left\lbrack \matrix{u_1 & u_2 & u_3} \right\rbrack $, $ v= \left\lbrack \matrix{v_1\cr v_2\cr v_3} \right\rbrack$

Operations

The mathematical shorthand notation for matrices is very elegant. It simplifies these equations that would be quite cumbersome, otherwise. There are only a few operations that we are concerned with to allow us to get started. Each operation has a basic algorithm to follow to perform a calculation, and the algorithm easily scales with the dimensions of a matrix.

Furthermore, the relationship between math and programming is not always clear. I have a number of colleagues that are excellent programmers and yet they do not consider their math skills very strong. I think that it could be helpful to many to see the conversion of the matrix operations that I just described from mathematical notation into code form. Once I demonstrate how to perform an operation with the mathematical notation and algorithmic steps, I will also show a basic C++ implementation of the concept to help you understand how these concepts map to actual code.

The operations that I implement below assume a Matrix class with the following interface:

C++

class Matrix
{
public:
  // Ctor and Dtor omitted
 
  // Calculate and return a reference to the specified element.
  double& element(size_t row, size_t column);
 
  // Resizes this Matrix to have the specified size.
  void resize(size_t row, size_t column);
 
  // Returns the number rows.
  size_t rows();
 
  // Returns the number of columns.
  size_t columns();
private:
  std::vector<double>    data;
};

Transpose

Transpose is a unary operation that is performed on a single matrix, and it is represented by adding a superscript T to target matrix.

For example, the transpose of a matrix A is represented with A^T.

The transpose "flips" the orientation of the matrix so that each row becomes a column, and each original column is transformed into a row. I think that a few examples will make this concept more clear.

$ A= \left\lbrack \matrix{a & b & c \cr d & e & f \cr g & h & i} \right\rbrack $

$A^T= \left\lbrack \matrix{a & d & g \cr b & e & h \cr c & f & i} \right\rbrack $

The resulting matrix will contain the same set of values as the original matrix, only their position in the matrix changes.

$ B= \left\lbrack \matrix{1 & 25 & 75 & 100\cr 0 & 5 & -50 & -25\cr 0 & 0 & 10 & 22} \right\rbrack $

$B^T= \left\lbrack \matrix{1 & 0 & 0 \cr 25 & 5 & 0 \cr 75 & -50 & 10 \cr 100 & -25 & 22} \right\rbrack $

It is very common to transpose a matrix, including vertices, before performing other operations. The reason will become clear for matrix multiplication.

$ u= \left\lbrack \matrix{u_1 & u_2 & u_3} \right\rbrack $

$u^T= \left\lbrack \matrix{u_1 \cr u_2 \cr u_3} \right\rbrack $

Matrix Addition

Addition can only be performed between two matrices that are the same size. By size, we mean that the number of rows and columns are the same for each matrix. Addition is performed by adding the values of the corresponding positions of both matrices. The sum of the values creates a new matrix that is the same size as the original two matrices provided to the add operation.

$ A= \left\lbrack \matrix{0 & -2 & 4 \cr -6 & 8 & 0 \cr 2 & -4 & 6} \right\rbrack, B= \left\lbrack \matrix{1 & 3 & 5 \cr 7 & 9 & 11 \cr 13 & 15 & 17} \right\rbrack $

Then

$A+B=\left\lbrack \matrix{1 & 1 & 9 \cr 1 & 17 & 11 \cr 15 & 11 & 23} \right\rbrack $

The size of these matrices do not match in their current form.

$U= \left\lbrack \matrix{4 & -8 & 5}\right\rbrack, V= \left\lbrack \matrix{-1 \\ 5 \\ 4}\right\rbrack $

However, if we take the transpose of one of them, their sizes will match and they can then be added together. The size of the result matrix depends upon which matrix we perform the transpose operation on from the original expression.:

$U^T+V= \left\lbrack \matrix{3 \\ -3 \\ 9} \right\rbrack $

$U+V^T= \left\lbrack \matrix{3 & -3 & 9} \right\rbrack $

Matrix addition has the same algebraic properties as with the addition of two scalar values:

Commutative Property:

$ A+B=B+A $

Associative Property:

$ (U+V)+W=U+(V+W) $

Identity Property:

$ A+0=A $

Inverse Property:

$ A+(-A)=0 $

The code required to implement matrix addition is relatively simple. Here is an example for the Matrix class definition that I presented earlier:

C++

void Matrix::operator+=(const Matrix& rhs)
{
  if(rhs.data.size()== data.size())
  {
    // We can simply add each corresponding element
    // in the matrix element data array.
    for(size_t index =0; index < data.size();++index)
    {
      data[index]+= rhs.data[index];
    }
  }
}
 
Matrix& operator+(const Matrix& lhs,
                   const Matrix& rhs)
{
  Matrix result(lhs);
  result += rhs;
 
  return result;
}

Scalar Multiplication

Scalar multiplication allows a single scalar value to be multiplied with every entry within a matrix. The result matrix is the same size as the matrix provided to the scalar multiplication expression:

$ A= \left\lbrack \matrix{3 & 6 & -9 \cr 12 & -15 & -18} \right\rbrack $

Then

$ \frac{1}3 A= \left\lbrack \matrix{1 & 2 & -3 \cr 4 & -5 & -6} \right\rbrack, 0A= \left\lbrack \matrix{0 & 0 & 0 \cr 0 & 0 & 0} \right\rbrack, -A= \left\lbrack \matrix{-3 & -6 & 9 \cr -12 & 15 & 18} \right\rbrack, $

Scalar multiplication with a matrix exhibits these properties, where c and d are scalar values:

Distributive Property:

$ c(A+B)=cA+cB $

Identity Property:

$ 1A=A $

The implementation for scalar multiplication is even simpler than addition.
Note: this implementation only allows the scalar value to appear before the Matrix object in multiplication expressions, which is how the operation is represented in math notation.:

C++

void Matrix::operator*=(constdouble lhs)
{
  for(size_t index =0; index < data.size();++index)
  {
    data[index]*= rhs;
  }
}
 
Matrix& operator*(constdouble scalar,
                   const Matrix& rhs)
{
  Matrix result(rhs);
  result *= scalar;
 
  return result;
}

Matrix Multiplication

Everything seems very simple with matrices, at least once you get used to the new structure. Then you are introduced to matrix multiplication. The algorithm for multiplication is not difficult, however, it is much more labor intensive compared to the other operations that I have introduced to you. There are also a few more restrictions on the parameters for multiplication to be valid. Finally, unlike the addition operator, the matrix multiplication operator does not have all of the same properties as the multiplication operator for scalar values; specifically, the order of parameters matters.

Input / Output

Let's first address what you need to be able to multiply matrices, and what type of matrix you can expect as output. Once we have addressed the structure, we will move on to the process.

Given an the following two matrices:

$ A= \left\lbrack \matrix{a_{11} & \ldots & a_{1n} \cr \vdots & \ddots & \vdots \cr a_{m1} & \ldots & a_{mn} } \right\rbrack, B= \left\lbrack \matrix{b_{11} & \ldots & b_{1v} \cr \vdots & \ddots & \vdots \cr b_{u1} & \ldots & b_{uv} } \right\rbrack $

A valid product for $ AB=C $ is only possible if number of columns $ n $ in $ A $ is equal to the number of rows $ u $ in $ B $. The resulting matrix $ C $ will have the dimensions $ m \times v $.

$ AB=C= \left\lbrack \matrix{c_{11} & \ldots & c_{1v} \cr \vdots & \ddots & \vdots \cr c_{m1} & \ldots & c_{mv} } \right\rbrack, $

Let's summarize this in a different way, hopefully this arrangement will make the concept more intuitive:

Matrix multiplication dimensions

One last form of the rules for matrix multiplication:

How to Multiply

To calculate a single entry in the output matrix, we must multiply the element from each column in the first matrix, with the element in the corresponding row in the second matrix, and add all of these products together. We use the same row in the first matrix, $A$, for which we are calculating the row element in $C$. Similarly, we use the column in the second matrix, $B$ that corresponds with the calculating column element in $C$.

More succinctly, we can say we are multiplying rows into columns.

For example:

$ A= \left\lbrack \matrix{a_{11} & a_{12} & a_{13} \cr a_{21} & a_{22} & a_{23}} \right\rbrack, B= \left\lbrack \matrix{b_{11} & b_{12} & b_{13} & b_{14} \cr b_{21} & b_{22} & b_{23} & b_{24} \cr b_{31} & b_{32} & b_{33} & b_{34} } \right\rbrack $

The number of columns in $A$ is $3$ and the number of rows in $B$ is $3$, therefore, we can perform this operation. The size of the output matrix will be $2 \times 4$.

This is the formula to calculate the element $c_{11}$ in $C$ and the marked rows used from $A$ and the columns from $B$:

$ \left\lbrack \matrix{\color{#B11D0A}{a_{11}} & \color{#B11D0A}{a_{12}} & \color{#B11D0A}{a_{13}} \cr a_{21} & a_{22} & a_{23}} \right\rbrack \times \left\lbrack \matrix{\color{#B11D0A}{b_{11}} & b_{12} & b_{13} & b_{14} \cr \color{#B11D0A}{b_{21}} & b_{22} & b_{23} & b_{24} \cr \color{#B11D0A}{b_{31}} & b_{32} & b_{33} & b_{34} } \right\rbrack = \left\lbrack \matrix{\color{#B11D0A}{c_{11}} & c_{12} & c_{13} & c_{14}\cr c_{21} & c_{22} & c_{23} & c_{24} } \right\rbrack $

$ c_{11}= (a_{11}\times b_{11}) + (a_{12}\times b_{21}) + (a_{13}\times b_{31}) $

To complete the multiplication, we need to calculate these other seven values $ c_{12}, c_{13}, c_{14}, c_{21}, c_{22}, c_{23}, c_{24}$. Here is another example for the element $c_{23}$:

$ \left\lbrack \matrix{ a_{11} & a_{12} & a_{13} \cr \color{#B11D0A}{a_{21}} & \color{#B11D0A}{a_{22}} & \color{#B11D0A}{a_{23}} } \right\rbrack \times \left\lbrack \matrix{b_{11} & b_{12} & \color{#B11D0A}{b_{13}} & b_{14} \cr b_{21} & b_{22} & \color{#B11D0A}{b_{23}} & b_{24} \cr b_{31} & b_{32} & \color{#B11D0A}{b_{33}} & b_{34} } \right\rbrack = \left\lbrack \matrix{c_{11} & c_{12} & c_{13} & c_{14}\cr c_{21} & c_{22} & \color{#B11D0A}{c_{23}} & c_{24} } \right\rbrack $

$ c_{23}= (a_{21}\times b_{13}) + (a_{22}\times b_{23}) + (a_{23}\times b_{33}) $

Notice how the size of the output matrix changes. Based on this and the size of the input matrices you can end up with some interesting results:

$ \left\lbrack \matrix{a_{11} \cr a_{21} \cr a_{31} } \right\rbrack \times \left\lbrack \matrix{ b_{11} & b_{12} & b_{13} } \right\rbrack = \left\lbrack \matrix{c_{11} & c_{12} & c_{13} \cr c_{21} & c_{22} & c_{23} \cr c_{31} & c_{32} & c_{33} } \right\rbrack $

$ \left\lbrack \matrix{ a_{11} & a_{12} & a_{13} } \right\rbrack \times \left\lbrack \matrix{b_{11} \cr b_{21} \cr b_{31} } \right\rbrack = \left\lbrack \matrix{c_{11} } \right\rbrack $

Tip:
To help you keep track of which row to use from the first matrix and which column from the second matrix, create your result matrix of the proper size, then methodically calculate the value for each individual element.

The algebraic properties for the matrix multiplication operator do not match those of the scalar multiplication operator. These are the most notable:

Not Commutative:

The order of the factor matrices definitely matters.

$ AB \ne BA $ in general

I think it is very important to illustrate this fact. Here is a simple $2 \times 2 $ multiplication performed two times with the order of the input matrices switched. I have highlighted the only two terms that the two resulting answers have in common:

$ \left\lbrack \matrix{a & b \cr c & d } \right\rbrack \times \left\lbrack \matrix{w & x\cr y & z } \right\rbrack = \left\lbrack \matrix{(\color{red}{aw}+by) & (ax+bz)\cr (cw+dy) & (cx+\color{red}{dz}) } \right\rbrack $

$ \left\lbrack \matrix{w & x\cr y & z } \right\rbrack \times \left\lbrack \matrix{a & b \cr c & d } \right\rbrack = \left\lbrack \matrix{(\color{red}{aw}+cx) & (bw+dx)\cr (ay+cz) & (by+\color{red}{dz}) } \right\rbrack $

Product of Zero:

$ AB=0 $ does not necessarily imply $ A=0 $ or $ B=0 $

Scalar Multiplication is Commutative:

$ (kA)B = k(AB) = A(kB) $

Associative Property:

Multiplication is associative, however, take note that the relative order for all of the matrices remains the same.

$ (AB)C=A(BC) $

Transpose of a Product:

The transpose of a product is equivalent to the product of transposed factors multiplied in reverse order

$ (AB)^T=B^TA^T $

Code

We are going to use a two-step solution to create a general purpose matrix multiplication solution. The first step is to create a function that properly calculates a single element in the output matrix:

C++

double Matrix::multiply_element(
  const Matrix& rhs,
  const Matrix& rhs,
  constsize_t i,
  constsize_t j
)
{
  double product =0;
 
  // Multiply across the specified row, i, for the left matrix
  // and the specified column, j, for the right matrix.
  // Accumulate the total of the products
  // to return as the calculated result.
  for(size_t col_index =1; col_index <= lhs.columns();++col_index)
  {
    for(size_t row_index =1; row_index <= rhs.rows();++row_index)
    {
      product += lhs.element(i, col_index)
               * rhs.element(row_index, j);
    }
  }
 
  return product;
}

Now create the outer function that performs the multiplication to populate each field of the output matrix:

C++

// Because we may end up creating a matrix with
// an entirely new size, it does not make sense
// to have a *= operator for this general-purpose solution.
Matrix& Matrix::operator*(const Matrix& lhs,
                           const Matrix& rhs)
{
  if(lhs.columns()== rhs.rows())
  {
    // Resize the result matrix to the proper size.
    this->resize(lhs.row(), rhs.columns());
 
    // Calculate the value for each element
    // in the result matrix.
    for(size_t i =1; i <= this->rows();++i)
    {
      for(size_t j =1; j <= this->columns();++j)
      {
        element(i,j)= multiply_element(lhs, rhs, i, j);
      }
    }
  }
 
  return*this;
}

Summary

3D computer graphics relies heavily on the concepts found in the branch of math called, Linear Algebra. I have introduced two basic constructs from Linear Algebra that we will need to move forward and perform the fundamental calculations for rendering a three-dimensional display. At this point I have only scratched the surface as to what is possible, and I have only demonstrated how. I will provide context and demonstrate the what and why, to a degree, on the path helping you begin to work with three-dimensional graphics libraries, even if math is not one of your strongest skills.

References

Kreyszig, Erwin; "Chapter 7" from Advanced Engineering Mathematics, 7th ed., New York: John Wiley & Sons, 1993

↧

No Brainer

December 29, 2015, 7:03 pm

≫ Next: Profiling Python in Production

≪ Previous: Introduction to the Math of Computer Graphics

For decades now, I have been haunted by the grainy, black-and-white x-ray of a human skull.

It is alive but empty, with a cavernous fluid-filled space where the brain should be. A thin layer of brain tissue lines that cavity like an amniotic sac. The image hails from a 1980 review article in Science: Roger Lewin, the author, reports that the patient in question had “virtually no brain”. But that’s not what scared me; hydrocephalus is nothing new, and it takes more to creep out this ex-biologist than a picture of Ventricles Gone Wild.

The stuff of nightmares. (From Oliviera et al 2012)

The stuff of nightmares. (From Oliveira et al 2012)

What scared me was the fact that this virtually brain-free patient had an IQ of 126.

He had a first-class honors degree in mathematics. He presented normally along all social and cognitive axes. He didn’t even realize there was anything wrong with him until he went to the doctor for some unrelated malady, only to be referred to a specialist because his head seemed a bit too large.

It happens occasionally. Someone grows up to become a construction worker or a schoolteacher, before learning that they should have been a rutabaga instead. Lewin’s paper reports that one out of ten hydrocephalus cases are so extreme that cerebrospinal fluid fills 95% of the cranium. Anyone whose brain fits into the remaining 5% should be nothing short of vegetative; yet apparently, fully half have IQs over 100. (Why, here’s another example from 2007; and yet another.) Let’s call them VNBs, or “Virtual No-Brainers”.

The paper is titled “Is Your Brain Really Necessary?”, and it seems to contradict pretty much everything we think we know about neurobiology. This Forsdyke guy over in Biological Theory argues that such cases open the possibility that the brain might utilize some kind of extracorporeal storage, which sounds awfully woo both to me and to the anonymous neuroskeptic over at Discovery.com; but even Neuroskeptic, while dismissing Forsdyke’s wilder speculations, doesn’t really argue with the neurological facts on the ground. (I myself haven’t yet had a chance to more than glance at the Forsdyke paper, which might warrant its own post if it turns out to be sufficiently substantive. If not, I’ll probably just pretend it is and incorporate it into Omniscience.)

On a somewhat less peer-reviewed note, VNBs also get routinely trotted out by religious nut jobs who cite them as evidence that a God-given soul must be doing all those things the uppity scientists keep attributing to the brain. Every now and then I see them linking to an off-hand reference I made way back in 2007 (apparently rifters.com is the only place to find Lewin’s paper online without having to pay a wall) and I roll my eyes.

And yet, 126 IQ. Virtually no brain. In my darkest moments of doubt, I wondered if they might be right.

So on and off for the past twenty years, I’ve lain awake at night wondering how a brain the size of a poodle’s could kick my ass at advanced mathematics. I’ve wondered if these miracle freaks might actually have the same brain mass as the rest of us, but squeezed into a smaller, high-density volume by the pressure of all that cerebrospinal fluid (apparently the answer is: no). While I was writing Blindsight— having learned that cortical modules in the brains of autistic savants are relatively underconnected, forcing each to become more efficient— I wondered if some kind of network-isolation effect might be in play.

Now, it turns out the answer to that is: Maybe.

Three decades after Lewin’s paper, we have “Revisiting hydrocephalus as a model to study brain resilience” by de Oliveira et al. (actually published in 2012, although I didn’t read it until last spring). It’s a “Mini Review Article”: only four pages, no new methodologies or original findings— just a bit of background, a hypothesis, a brief “Discussion” and a conclusion calling for further research. In fact, it’s not so much a review as a challenge to the neuro community to get off its ass and study this fascinating phenomenon— so that soon, hopefully, there’ll be enough new research out there warrant a real review.

The authors advocate research into “Computational models such as the small-world and scale-free network”— networks whose nodes are clustered into highly-interconnected “cliques”, while the cliques themselves are more sparsely connected one to another. De Oliveira et al suggest that they hold the secret to the resilience of the hydrocephalic brain. Such networks result in “higher dynamical complexity, lower wiring costs, and resilience to tissue insults.” This also seems reminiscent of those isolated hyper-efficient modules of autistic savants, which is unlikely to be a coincidence: networks from social to genetic to neural have all been described as “small-world”. (You might wonder— as I did— why de Oliveira et al. would credit such networks for the normal intelligence of some hydrocephalics when the same configuration is presumably ubiquitous in vegetative and normal brains as well. I can only assume they meant to suggest that small-world networking is especially well-developed among high-functioning hydrocephalics.) (In all honesty, it’s not the best-written paper I’ve ever read. Which seems to be kind of a trend on the ‘crawl lately.)

The point, though, is that under the right conditions, brain damage may paradoxically result in brain enhancement. Small-world, scale-free networking— focused, intensified, overclocked— might turbocharge a fragment of a brain into acting like the whole thing.

Can you imagine what would happen if we applied that trick to a normal brain?

If you’ve read Echopraxia, you’ll remember the Bicameral Order: the way they used tailored cancer genes to build extra connections in their brains, the way they linked whole brains together into a hive mind that could rewrite the laws of physics in an afternoon. It was mostly bullshit, of course: neurological speculation, stretched eight unpredictable decades into the future for the sake of a story.

But maybe the reality is simpler than the fiction. Maybe you don’t have to tweak genes or interface brains with computers to make the next great leap in cognitive evolution. Right now, right here in the real world, the cognitive function of brain tissue can be boosted— without engineering, without augmentation— by literal orders of magnitude. All it takes, apparently, is the right kind of stress. And if the neuroscience community heeds de Oliveira et al‘s clarion call, we may soon know how to apply that stress to order. The singularity might be a lot closer than we think.

Also a lot squishier.

Wouldn’t it be awesome if things turned out to be that easy?

↧

Profiling Python in Production

December 30, 2015, 12:58 am

≫ Next: Bank of America trying to load up on patents for the technology behind Bitcoin

≪ Previous: No Brainer

We recently reduced CPU usage across our fleet by 80%. One key technique that made this possible was a lightweight profiling strategy that we could run in production. This post is about the ways we approached instrumentation, the tradeoffs involved, and some tools you can use to optimize your own apps (including code!).

Background

Nylas is a developer platform that provides APIs to integrate with email, contacts, and calendar. At the core of this is a technology we call the Sync Engine. It’s a large Python application (~30k LOC) which handles syncing via IMAP, SMTP, ActiveSync, and other protocols. The code is open source on GitHub with a solid community.

We run this as managed infrastructure for developers and companies around the world. Their products depend on our uptime and responsiveness, so performance of the system is a huge priority.

If You Can’t Measure It, You Can’t Manage It

Optimization starts with measurement and instrumentation, but there’s more than one way to profile. It’s important to be able to profile both at small scale using test benchmarks, and at large scale in a live environment.

For example, when a new mailbox is added to the Nylas platform, it’s essential that messages are synced as quickly as possible. To optimize this, we need to understand how different sync strategies and optimizations affect the first few seconds of a sync.

By setting up a local test and adding a bit of custom instrumentation, we can build up a call graph of the program. This is simply a matter of using sys.setprofile() to intercept function calls. To better see exactly what’s happening, we can export this call graph to a specific JSON format, and then load this into the powerful visualizer built into the Chrome Developer Tools. This lets us inspect the precise timeline of execution:

chrome developer tools profile

Doing it live

This strategy works well for detailed benchmarking of specific parts of an application, but is poorly suited to analyzing the aggregate performance of a large-scale system. Function call instrumentation introduces significant slowdown and generates a huge amount of data, so we can’t just directly run this profiler in production.

However, it’s difficult to accurately recreate production slowness in artificial benchmarks. The sync engine’s workload is heterogeneous: we sync accounts that are large and small, with different levels of activity, using different strategies depending on mail provider. If the tests aren’t actually representative of real-world workload, we’ll end up with ineffective optimizations.

Our answer to these shortcomings was to add lightweight instrumentation that we could continuously run in our full production cluster, and design a system to roll up the resulting data into a manageable format and size.

At the heart of this strategy is a simple statistical profiler – code that periodically samples the application call stack, and records what it’s doing. This approach loses some granularity and is non-deterministic. But its overhead is low and controllable (just choose the sampling interval). Coarse sampling is fine, because we’re trying to identify the biggest areas of slowness.

A number of libraries implement variants of this, but in Python, we can write a stack sampler in less than 20 lines:

importcollectionsimportsignalclassSampler(object):def__init__(self,interval=0.001):self.stack_counts=collections.defaultdict(int)self.interval=0.001def_sample(self,signum,frame):stack=[]whileframeisnotNone:formatted_frame='{}({}).format(frame.f_code.co_name,frame.f_globals.get('__name__'))stack.append(formatted_frame)frame=frame.f_backformatted_stack=';'.join(reversed(stack))self.stack_counts[formatted_stack]+=1signal.setitimer(signal.ITIMER_VIRTUAL,self.interval,0)defstart(self):signal.signal(signal.VTALRM,self._sample)signal.setitimer(signal.ITIMER_VIRTUAL,self.interval,0)

Calling Sampler.start() sets the Unix signal ITIMER_VIRTUAL to be sent after the number of seconds specified by interval. This essentially creates a repeating alarm that will run the _sample method.

When the signal fires, this function saves the application’s stack, and keeps track of how many times we’ve sampled that same stack. Frequently sampled stacks correspond to code paths where the application is spending a lot of time.

The memory overhead associated with maintaining these stack counts stays reasonable, since the application only executes so many different frames. It’s also possible to bound the memory usage, if necessary, by periodically pruning infrequent stacks. In our application, the actual CPU overhead is demonstrably negligible:

overhead of instrumentation

Now that we’ve added instrumentation in the application, we have each worker process expose its profiling data via a simple HTTP interface (see code). This lets us take a production worker process and generate a flamegraph that concisely illustrates where the worker is spending time:

curl $host:$port | flamegraph.pl > profile.svg

single-worker flamegraph

This visualization makes it easy to quickly spot where CPU time is being spent in the actual process. For example, around 15% of runtime is being spent in the get() method highlighted above, executing a database load that turns out to normally be unnecessary. This wasn’t evident in local testing, but now it’s easy to identify and fix.

However, the load on any single worker process isn’t necessarily representative of the aggregate workload across all processes and instances. We want to be able to aggregate stacktraces from multiple processes. We also need a way to save historical data, as the profiler only stores traces for the current lifetime of the process.

To do this, we run a collector agent that periodically polls all sync engine processes (across multiple machines), and persists the aggregated profiling data to its own local store. Because we’re vending data over HTTP, this is really straightforward; there’s no need to tail and rotate any files on production instances.

Finally, a lightweight web app can visualize this data on demand. Answering the question, “Where is our application spending CPU time?” is now as simple as visiting an internal URL:

single-worker flamegraph

Because we can render profiles for any given time interval, it’s easy to track down the cause of any regressions and the moment they were introduced.

The code for all this is open source on GitHub– take a look and try it yourself!

Results

Having deployed this instrumentation, it was easy to identify slow parts of our managed sync engine and apply a variety of optimizations to speed things up. The net result was an 80% reduction in CPU load. The graph shows the effects after two successive sets of optimization patches were shipped.

observed speedup

Being able to measure and introspect our services in a variety of ways is crucial to keeping them stable and performant. The simple tooling presented here is just one part of the larger monitoring infrastructure at Nylas. We hope to cover more of that in future posts.

Want to work with us?

Our team is hiring engineers and designers to build the future of email. Check out our jobs page for more details.

↧

Bank of America trying to load up on patents for the technology behind Bitcoin

December 29, 2015, 5:12 pm

≫ Next: Governments deterring businessmen and tourists with cumbersome visa requirements

≪ Previous: Profiling Python in Production

Bankers may not think bitcoin will ever go fully mainstream, but they clearly believe there is value in the technology that powers such cryptocurrencies, known as blockchain.

On Dec. 17, the US Patent office published 10 blockchain-related patents filed by Bank of America in July 2014. The patents haven’t been granted yet, but the filings demonstrate the bank’s interest in using blockchain technology to revamp its backend operations, which, like other financial institutions, are largely paper-based.

The wide-ranging patents cover everything from a “cryptocurrency transaction payment system” which would let users make transactions using cryptocurrency, to risk detection, storing cryptocurrencies offline, and using the blockchain to measure fraudulent activity. (The blockchain is essentially a publicly available ledger that’s distributed to everyone within a network.)

Bank of America had no comment.

Financial institutions are quickly ramping up their research efforts around blockchain technology. Last week, IBM, JPMorgan, the London Stock Exchange, and Wells Fargo announced the Open Ledger Project, a new consortium that will focus on allowing businesses to easily build their own blockchain technology. Bank of America is a part of a consortium led by blockchain startup R3 that’s developing blockchain technology to be used in financial markets. The Aite Group estimates that banks have invested $75 million this year on blockchain tech, a figure the research firm forecasts will grow to $400 million in 2019.

Other financial institutions have been building up their blockchain-related intellectual property. Goldman Sachs filed a patent for its own cryptocurrency, SETLCoin, that would allow traders to execute and clear trades in real time.

It’s hard to speculate what Bank of America will do with these patents, if anything. But Coindesk notes these patents could be hinting at BofA working on a complete network based on blockchain.

↧

Governments deterring businessmen and tourists with cumbersome visa requirements

December 30, 2015, 12:28 pm

≫ Next: Incorporating and accessing binary data into a C program

≪ Previous: Bank of America trying to load up on patents for the technology behind Bitcoin

THE rise of big emerging economies like China and India, and the steady march of globalisation, have led to a surge in the numbers of people wanting to travel abroad for business or tourism. As a result, demand for visas is at unprecedented levels. In the fiscal year to the end of September 2014 the United States granted just under 10m visas—up from around 6m in 1997, despite blips in the wake of the terrorist attacks of September 11th 2001 and the global financial crisis of 2007-08 (see chart 1).

Citizens of America, Britain and some other rich countries can travel to most places without a visa. Chinese and Indian travellers are far more likely to have to apply for them. And citizens of a few benighted places, such as Iraq and Afghanistan, have to submit to the cost and bureaucracy—and often the humiliation—of the visa-application process to get to most places (see chart 2).

The most sensible response to this surge in demand for short-term visas would be for governments to streamline the application process and scrap the most onerous requirements. But governments are often not sensible about such things. The 26 European countries with a common visa policy—the “Schengen group”—require tourists from India and other developing countries to provide several months’ worth of bank statements and pay slips. Visitors to Britain often have to fill in a ten-page application form, including details of every trip abroad for the past ten years. Business travellers to India must provide two references. Mexico has scrapped a rule requiring visa applicants (including women) to submit a description of their moustaches. But in 2016 America will start requiring visas for some travellers who currently do not need them—if, for example, they have visited Iran, Iraq, Syria or Sudan in the previous five years.

In many cases, instead of simplifying the visa process, governments have offloaded it to private contractors. Travellers may now have to pay a service fee to the company handling their application on top of the standard visa fee. The biggest firm in this growing business is VFS Global, which is part of Kuoni, a Swiss tourism company. Starting from a single premises in Mumbai in 2001, handling applications for American visas, VFS now has more than 1,900 visa centres in 124 countries, processing paperwork for 48 governments.

Of the 113m visa applications made worldwide in 2013, one in three went through a contractor, reckons VFS, which has about half the market. Its main rivals are CSC, with around 10% of the market, and TLScontact, with around 7%. Dozens of smaller firms make up the remainder of the market. The private contractors collect and verify the applicant’s paperwork, ensure that forms are filled in properly, take fingerprints and other biometric information and collect the fees. The consular staff of the destination country simply decide whether to grant the visa, and slap a sticker in the passport of successful applicants.

For the contractors, it is a nice little earner. VFS probably enjoys operating margins of 20%, reckons Kathleen Gailliot, an analyst at Natixis, a French bank. The companies are given a free hand to pad their earnings with pricey “premium” services. In Mumbai, for example, VFS offers Indians applying for British visas a text on their mobile phones to notify them that their passports are ready for collection, at 128 rupees ($2) a shot. For an extra 2,548 rupees, applicants can use a special “lounge” area while submitting their documents, and have their passports posted back to them.

VFS accounts for just 5% of Kuoni’s revenues but more than 60% of its operating profits. So bright are the division’s prospects that its parent company is getting out of the tour-operator business, which it has been in since 1906, to concentrate on visa-processing and a few other specialist travel services.

Until VFS opened its Mumbai office, applicants had to queue for an average of five hours in the sweltering heat outside the American consulate. After the job was handed to the contractor, the typical waiting time fell to one hour. However, applicants still have no choice but to submit to whatever petty demands contractors make—such as, say, banning them from using mobile phones while they sit waiting for their appointments. If the staff are rude, the queues are badly managed or the “extras” extravagantly priced, travellers can hardly take their business elsewhere.

The application-processing firms are profiting both from travellers’ lack of choice and from governments’ failure to consider the economic damage caused by their visa requirements. There is scant evidence that making all travellers submit the same documents every time they want to travel, or provide extensive financial details, protects countries from terrorists or illegal immigrants. In contrast, there is evidence of how liberal visa regimes bring in the bucks. A report in 2014 from the European Parliament, “A Smarter Visa Policy for Economic Growth”, estimated that over-strict visa rules probably cost the EU economy 250,000 jobs and €12.6 billion ($13.8 billion) a year in lost output. It recommended requiring fewer documents from applicants, handing out longer visas and simplifying the whole process.

Since Britain is not part of the Schengen group, Chinese people taking a tour of Europe have to apply for a second visa to cross the Channel. Only 6% of them do so, says Euromonitor, a research firm. The British Tourist Authority has complained that the country’s visa policies cost it £2.8 billion ($4.1 billion) a year in lost revenue.

However, amid worries about the wave of asylum-seekers from Syria and elsewhere, governments in Europe and beyond will face pressure to keep making life hard for tourists and business travellers—even as other departments of those same governments spend heavily on promoting tourism and foreign investment.

↧

Incorporating and accessing binary data into a C program

December 30, 2015, 9:21 pm

≫ Next: WebSockets: caution required

≪ Previous: Governments deterring businessmen and tourists with cumbersome visa requirements

The other day I needed to incorporate a large blob of binary data in a C program. One simple way is to use xxd, for example, on the binary data in file "blob", one can do:

xxd --include blob 

 unsigned char blob[] = {  
  0xc8, 0xe5, 0x54, 0xee, 0x8f, 0xd7, 0x9f, 0x18, 0x9a, 0x63, 0x87, 0xbb,  
  0x12, 0xe4, 0x04, 0x0f, 0xa7, 0xb6, 0x16, 0xd0, 0x70, 0x06, 0xbc, 0x57,  
  0x4b, 0xaf, 0xae, 0xa2, 0xf2, 0x6b, 0xf4, 0xc6, 0xb1, 0xaa, 0x93, 0xf2,  
  0x12, 0x39, 0x19, 0xee, 0x7c, 0x59, 0x03, 0x81, 0xae, 0xd3, 0x28, 0x89,  
  0x05, 0x7c, 0x4e, 0x8b, 0xe5, 0x98, 0x35, 0xe8, 0xab, 0x2c, 0x7b, 0xd7,  
  0xf9, 0x2e, 0xba, 0x01, 0xd4, 0xd9, 0x2e, 0x86, 0xb8, 0xef, 0x41, 0xf8,  
  0x8e, 0x10, 0x36, 0x46, 0x82, 0xc4, 0x38, 0x17, 0x2e, 0x1c, 0xc9, 0x1f,  
  0x3d, 0x1c, 0x51, 0x0b, 0xc9, 0x5f, 0xa7, 0xa4, 0xdc, 0x95, 0x35, 0xaa,  
  0xdb, 0x51, 0xf6, 0x75, 0x52, 0xc3, 0x4e, 0x92, 0x27, 0x01, 0x69, 0x4c,  
  0xc1, 0xf0, 0x70, 0x32, 0xf2, 0xb1, 0x87, 0x69, 0xb4, 0xf3, 0x7f, 0x3b,  
  0x53, 0xfd, 0xc9, 0xd7, 0x8b, 0xc3, 0x08, 0x8f  
 };  
 unsigned int blob_len = 128;

..and redirecting the output from xxd into a C source and compiling this simple and easy to do.

However, for large binary blobs, the C source can be huge, so an alternative way is to use the linker ld as follows:

ld -s -r -b binary -o blob.o blob

...and this generates the blob.o object code. To reference the data in a program one needs to determine the symbol names of the start, end and perhaps the length too. One can use objdump to find this as follows:

 objdump -t blob.o  
 blob.o:   file format elf64-x86-64  
 SYMBOL TABLE:  
 0000000000000000 l  d .data        0000000000000000 .data  
 0000000000000080 g    .data        0000000000000000 _binary_blob_end  
 0000000000000000 g    .data        0000000000000000 _binary_blob_start  
 0000000000000080 g    *ABS*        0000000000000000 _binary_blob_size

To access the data in C, use something like the following:

 cat test.c  
 
 #include <stdio.h>  
 int main(void)  
 {  
         extern void *_binary_blob_start, *_binary_blob_end;  
         void *start = &_binary_blob_start,  
            *end = &_binary_blob_end;  
         printf("Data: %p..%p (%zu bytes)\n",   
                 start, end, end - start);  
         return 0;  
 }

...and link and run as follows:

 gcc test.c blob.o -o test  
 ./test   
 Data: 0x601038..0x6010b8 (128 bytes)

So for large blobs, I personally favour using ld to do the hard work for me since I don't need another tool (such as xxd) and it removes the need to convert a blob into C and then compile this.

↧

WebSockets: caution required

December 30, 2015, 4:46 am

≫ Next: The “Chad” bug

≪ Previous: Incorporating and accessing binary data into a C program

When developers hear that WebSockets are going to land in the near future in Rails they get all giddy with excitement.

But your users don't care if you use WebSockets:

Users want "delightful realtime web apps".
Developers want "delightfully easy to build realtime web apps".
Operations want "delightfully easy to deploy, scale and manage realtime web apps".

If WebSockets get us there, great, but it is an implementation detail that comes at high cost.

Do we really need ultra high performance, full duplex Client-Server communication?

WebSockets provides simple APIs to broadcast information to clients and simple APIs to ship information from the clients to the web server.

A realtime channel to send information from the server to the client is very welcome. In fact it is a part of HTTP 1.1.

However, a brand new API for shipping information to the server from web browsers introduce a new decision point for developers:

When a user posts a message on chat, do I make a RESTful call and POST a message or do I bypass REST and use WebSockets?
If I use the new backchannel, how do I debug it? How do I log what is going on? How do I profile it? How do I ensure it does not slow down other traffic to my site? Do I also expose this endpoint in a controller action? How do I rate limit this? How do I ensure my background WebSocket thread does not exhaust my db connection limit?

If an API allows hundreds of different connections concurrent access to the database, bad stuff will happen.

Introducing this backchannel is not a clear win and comes with many caveats.

I do not think the majority of web applications need a new backchannel into the web server. On a technical level you would opt for such a construct if you were managing 10k interactive console sessions on the web. You can transport data more efficiently to the server, in that the web server no longer needs to parse HTTP headers, Rails does not need to do a middleware crawl and so on.

But the majority of web applications out there are predominantly read applications. Lots of users benefit from live updates, but very few change information. It is incredibly rare to be in a situation where the HTTP header parsing optimisation is warranted; this work is done sub millisecond. Bypassing Rack middleware on the other hand can be significant, especially when full stack middleware crawls are a 10-20ms affair. That however is an implementation detail we can optimise and not a reason to rule out REST for client to server communications.

For realtime web applications we need simple APIs to broadcast information reliably and quickly to clients. We do not need new mechanisms for shipping information to the server.

What's wrong with WebSockets?

WebSockets had a very tumultuous ride with a super duper unstable spec during the journey. The side effects of this joyride show in quite a few spots. Take a look at Ilya Grigorik's very complete implementation. 5 framing protocols, 3 handshake protocols and so on.

At last, today, this is all stable and we have RFC6455 which is implemented ubiquitously across all major modern browsers. However, there was some collateral damage:

IE9 and earlier are not supported
Many libraries – including the most popular Ruby one – ship with multiple implementations, despite Hixie 75 being flawed.

I am confident the collateral damage will, in time, be healed. That said, even the most perfect implementation comes with significant technical drawbacks.

1) Proxy servers can wreak havoc with WebSockets running over unsecured HTTP

The proxy server issue is quite widespread. Our initial release of Discourse used WebSockets, however reports kept coming in of "missing updates on topics" and so on. Amongst the various proxy pariahs was my mobile phone network Telstra which basically let you have an open socket, but did not let any data through.

To work around the "WebSocket is dead but still appears open problem" WebSocket implementers usually introduce a ping/pong message. This solution works fine provided you are running over HTTPS, but over HTTP all bets are off and rogue proxies will break you.

That said, "... but you must have HTTPS" is the weakest argument against WebSocket adoption, I want all the web to be HTTPS, it is the future and it is getting cheaper every day. But you should know that weird stuff will definitely happen you deploy WebSockets over unsecured HTTP. Unfortunately for us at Discourse dropping support for HTTP is not an option quite yet, as it would hurt adoption.

2) Web browsers allow huge numbers of open WebSockets

The infamous 6 connections per host limit does not apply to WebSockets. Instead a far bigger limit holds (255 in Chrome and 200 in Firefox). This blessing is also a curse. It means that end users opening lots of tabs can cause large amounts of load and consume large amounts of continuous server resources. Open 20 tabs with a WebSocket based application and you are risking 20 connections unless the client/server mitigates.

There are quite a few ways to mitigate:

If we have a reliable queue driving stuff, we can shut down sockets after a while (or when in a background tab) and reconnect later on and catch up.
If we have a reliable queue driving stuff, we can throttle and turn back high numbers of TCP connections at our proxy or even iptables, but it is hard to guess if we are turning away the right connections.
On Firefox and Chrome we can share a connection by using a shared web worker, which is unlikely to be supported on mobile and is absent from Microsoft's offerings. I noticed Facebook are experimenting with shared workers (Gmail and Twitter are not).
MessageBus uses browser visibility APIs to slow down communication on out-of-focus tabs, falling back to a 2 minute poll on background tabs.

3) WebSockets and HTTP/2 transport are not unified

HTTP/2 is able to cope with the multiple tab problem much more efficiently than WebSockets. A single HTTP/2 connection can be multiplexed across tabs, which makes loading pages in new tabs much faster and significantly reduces the cost of polling or long polling from a networking point of view. Unfortunately, HTTP/2 does not play nice with WebSockets. There is no way to tunnel a WebSocket over HTTP/2, they are separate protocols.

There is an expired draft to unify the two protocols, but no momentum around it.

HTTP/2 has the ability to stream data to clients by sending multiple DATA frames, meaning that streaming data from the server to the client is fully supported.

Unlike running a Socket server, which includes a fair amount of complex Ruby code, running a HTTP/2 server is super easy. HTTP/2 is now in NGINX mainline, you can simply enable the protocol and you are done.

4) Implementing WebSockets efficiently on the server side requires epoll, kqueue or I/O Completion ports.

Efficient long polling, HTTP streaming (or Server Sent Events) is fairly simple to implement in pure Ruby since we do not need to repeatedly run IO.select. The most complicated structure we need to deal with is a TimerThread

Efficient Socket servers on the other hand are rather complicated in the Ruby world. We need to keep track of potentially thousands of connections dispatching Ruby methods when new data is available on any of the sockets.

Ruby ships with IO select which allows you to watch an array of sockets for new data, however it is fairly inefficient cause you force the kernel to keep walking big arrays to figure out if you have any new data. Additionally, it has a hard limit of 1024 entries (depending on how you compiled your kernel), you can not select on longer lists. EventMachine solves this limitation by implementing epoll (and kqueue for BSD).

Implementing epoll correctly is not easy.

5) Load balancing WebSockets is complicated

If you decide to run a farm of WebSockets, proper load balancing is complicated. If you find out that your Socket servers are overloaded and decide to quickly add a few servers to the mix, you have no clean way of re-balancing current traffic. You have to terminate overloaded servers due to connections being open indefinitely. At that point you are exposing yourself to a flood of connections (which can be somewhat mitigated by clients). Furthermore if "on reconnect" we have to refresh the page, restart a socket server and you will flood your web servers.

With WebSockets you are forced to run TCP proxies as opposed to HTTP proxies. TCP proxies can not inject headers, rewrite URLs or perform many of the roles we traditionally let our HTTP proxies take care of.

Denial of service attacks that are usually mitigated by front end HTTP proxies can not be handled by TCP proxies; what happens if someone connects to a socket and starts pumping messages into it that cause database reads in your Rails app? A single connection can wreak enormous amounts of havoc.

6) Sophisticated WebSocket implementations end up re-inventing HTTP

Say we need to be subscribed to 10 channels on the web browser (a chat channel, a notifications channel and so on), clearly we will not want to make 10 different WebSocket connections. We end up multiplexing commands to multiple channels on a single WebSocket.

Posting "Sam said hello" to the "/chat" channel ends up looking very much like HTTP. We have "routing" which specifies the channel we are posting on, this looks very much like HTTP headers. We have a payload, that looks like HTTP body. Unlike HTTP/2 we are unlikely to get header compression or even body compression.

7) WebSockets give you the illusion of reliability

WebSockets ship with a very appealing API.

You can connect
You have a reliable connection to the server due to TCP
You can send and receive messages

But... the Internet is a very unreliable place. Laptops go offline, you turn on airplane mode when you had it with all the annoying marketing calls, Internet sometimes does not work.

This means that this appealing API still needs to be backed by reliable messaging, you need to be able to catch up with a backlog of messages and so on.

When implementing WebSockets you need to treat them just as though they are simple HTTP calls that can go missing be processed at the server out-of-order and so on. They only provide the illusion of reliability.

WebSockets are an implementation detail, not a feature

At best, WebSockets are a value add. They provide yet another transport mechanism.

There are very valid technical reasons many of the biggest sites on the Internet have not adopted them. Twitter use HTTP/2 + polling, Facebook and Gmail use Long Polling. Saying WebSockets are the only way and the way of the future, is wrongheaded. HTTP/2 may end up winning this battle due to the huge amount of WebSocket connections web browsers allow, and HTTP/3 may unify the protocols.

You may want to avoid running dedicated socket servers (which at scale you are likely to want to run so sockets do not interfere with standard HTTP traffic). At Discourse we run no dedicated long polling servers, adding capacity is trivial. Capacity is always balanced.
You may be happy with a 30 second delays and be fine with polling.
You may prefer the consolidated transport HTTP/2 offers and go for long polling + streaming on HTTP/2

Messaging reliability is far more important than WebSockets

MessageBus is backed by a reliable pub/sub channel. Messages are globally sequenced. Messages are locally sequenced to a channel. This means that at any point you can "catch up" with old messages (capped). API wise it means that when a client subscribes it has the option to tell the server what position the channel is:

// subscribe to the chat channel at position 7
MessageBus.subscribe('/chat', function(msg){ alert(msg); }, 7);

Due to the reliable underpinnings of MessageBus it is immune to a class of issues that affect pure WebSocket implementations.

This underpinning makes it trivial to write very efficient cross process caches amongst many other uses.

Reliable messaging is a well understood concept. You can use Erlang, RabbitMQ, ZeroMQ, Redis, PostgreSQL or even MySQL to implement reliable messaging.

With reliable messaging implemented, multiple transport mechanisms can be implemented with ease. This "unlocks" the ability to do long-polling, long-polling with chunked encoding, EventSource, polling, forever iframes etc in your framework.

When picking a realtime framework, prefer reliable underpinnings to WebSockets.

Where do I stand?

Discourse does not use WebSockets. Discourse docker ships with HTTP/2 templates.

We have a realtime web application. I can make a realtime chat room just fine in 200 lines of code. I can run it just fine in Rails 3 or 4 today by simply including the gem. We handle millions of long polls a day for our hosted customers. As soon as someone posts a reply to a topic in Discourse it pops up on the screen.

We regard MessageBus as a fundamental and integral part of our stack. It enables reliable server/server live communication and reliable server/client live communication. It is transport agnostic. It has one dependency on rack and one dependency on redis, that is all.

When I hear people getting excited about WebSockets, this is the picture I see in my mind.

In a world that already had HTTP/2 it is very unlikely we would see WebSockets being ratified as it covers that vast majority of use cases WebSockets solves.

Special thank you to Ilya, Evan, Matt, Jeff, Richard and Jeremy for reviewing this article.

↧

	`class Matrix`
	`{`
	`public:`
	`// Ctor and Dtor omitted`

	`// Calculate and return a reference to the specified element.`
	`double& element(size_t row, size_t column);`

	`// Resizes this Matrix to have the specified size.`
	`void resize(size_t row, size_t column);`

	`// Returns the number rows.`
	`size_t rows();`

	`// Returns the number of columns.`
	`size_t columns();`
	`private:`
	`std::vector<double> data;`
	`};`

	`void Matrix::operator+=(const Matrix& rhs)`
	`{`
	`if(rhs.data.size()== data.size())`
	`{`
	`// We can simply add each corresponding element`
	`// in the matrix element data array.`
	`for(size_t index =0; index < data.size();++index)`
	`{`
	`data[index]+= rhs.data[index];`
	`}`
	`}`
	`}`

	`Matrix& operator+(const Matrix& lhs,`
	`const Matrix& rhs)`
	`{`
	`Matrix result(lhs);`
	`result += rhs;`

	`return result;`
	`}`

	`void Matrix::operator*=(constdouble lhs)`
	`{`
	`for(size_t index =0; index < data.size();++index)`
	`{`
	`data[index]*= rhs;`
	`}`
	`}`

	`Matrix& operator*(constdouble scalar,`
	`const Matrix& rhs)`
	`{`
	`Matrix result(rhs);`
	`result *= scalar;`

	`return result;`
	`}`

	`double Matrix::multiply_element(`
	`const Matrix& rhs,`
	`const Matrix& rhs,`
	`constsize_t i,`
	`constsize_t j`
	`)`
	`{`
	`double product =0;`

	`// Multiply across the specified row, i, for the left matrix`
	`// and the specified column, j, for the right matrix.`
	`// Accumulate the total of the products`
	`// to return as the calculated result.`
	`for(size_t col_index =1; col_index <= lhs.columns();++col_index)`
	`{`
	`for(size_t row_index =1; row_index <= rhs.rows();++row_index)`
	`{`
	`product += lhs.element(i, col_index)`
	`* rhs.element(row_index, j);`
	`}`
	`}`

	`return product;`
	`}`

	`// Because we may end up creating a matrix with`
	`// an entirely new size, it does not make sense`
	`// to have a *= operator for this general-purpose solution.`
	`Matrix& Matrix::operator*(const Matrix& lhs,`
	`const Matrix& rhs)`
	`{`
	`if(lhs.columns()== rhs.rows())`
	`{`
	`// Resize the result matrix to the proper size.`
	`this->resize(lhs.row(), rhs.columns());`

	`// Calculate the value for each element`
	`// in the result matrix.`
	`for(size_t i =1; i <= this->rows();++i)`
	`{`
	`for(size_t j =1; j <= this->columns();++j)`
	`{`
	`element(i,j)= multiply_element(lhs, rhs, i, j);`
	`}`
	`}`
	`}`

	`return*this;`
	`}`

What is CodeHalf?

What's next?

Sameness

Difference

Classification

Usage

Installation

Contribute

License

Like this:

Kolmogorov-Smirnov Hypothesis Testing

The Test Statistic

Two Sample Test

Discussion

A Field Manual for Rust

Moral Support for Learning the Memory Rules

Niche Observations

Kolmogorov-Smirnov Library

Datasets

N(0,1)

Results

A Diversion In QuickCheck

Back to the present

Questions and Answers

About

Tags

Open-source

Web Desktop

Applications

Extendable

Documentation

Installation

Full list of features

Contact & Community

Linear Algebra

Matrix

Vector

Operations

Transpose

Matrix Addition

Commutative Property:

Associative Property:

Identity Property:

Inverse Property:

Scalar Multiplication

Distributive Property:

Identity Property:

Matrix Multiplication

Input / Output

How to Multiply

Not Commutative:

Product of Zero:

Scalar Multiplication is Commutative:

Associative Property:

Transpose of a Product:

Code

Summary

References

Background

If You Can’t Measure It, You Can’t Manage It

Doing it live

Results

Want to work with us?

Do we really need ultra high performance, full duplex Client-Server communication?

What's wrong with WebSockets?

1) Proxy servers can wreak havoc with WebSockets running over unsecured HTTP

2) Web browsers allow huge numbers of open WebSockets

3) WebSockets and HTTP/2 transport are not unified

4) Implementing WebSockets efficiently on the server side requires epoll, kqueue or I/O Completion ports.

5) Load balancing WebSockets is complicated

6) Sophisticated WebSocket implementations end up re-inventing HTTP

7) WebSockets give you the illusion of reliability

WebSockets are an implementation detail, not a feature

Messaging reliability is far more important than WebSockets

Where do I stand?