Friday, March 27, 2015

Setting up a private Bower repository in an offline network

Previously I blogged about setting up an offline NPM repository (in a disconnected-from-the-internet network). This gave web devs in my organization a solution for working comfortably with nodejs packages. But what about front-end packages?

If you are not living on mars, you've probably heard of Bower, which is a nodejs package that helps you manage your front-end dependencies. This is the most popular front-end package manager existing today. So I decided to introduce it to my organization and let developers use it.

Like I said, Bower is a regular nodejs package installed on your machine. When you want to install angular in your project, you just go to your project folder and run bower install angular
How does this magic work? Where does Bower find the angular source code? There must be some service that Bower goes to to resolve the name "angular". Maybe it resolves to a git repository or something.
Well, this is exactly what happens. Bower is pre-configured with the address of a mapping service, which receives a name of a package and responds with a repository url. The most common repository types are github repositories. Then your bower client goes to that url, fetches the code and stores it in a folder called bower_components in your current path. Cool.

To make this work in a closed offline network I needed 3 things:
1. Have a package repository.
2. Create a similar mapping service that maps package names to our repository.
3. Have Bower reconfigured to communicate to our mapping service (registry in bower terms).

A package repository we already had. We were using Atlassian Stash as a git projects repository (like github). So I created a project named "bower libraries" and pushed some common libraries.
The mapping service I wrote with a colleague in a couple of hours. We found a large mapping file on bower.io with the original mappings to github, and we simply changed it to point to Stash. The service used this file and responded to requests looking at that file.
To reconfigure bower you simply create a .bowerrc file in your profile folder or project folder, containing this:
{
"registry": {
    "search": [
      "ADDRESS_OF_REGISTRY_HERE"
    ]
  }
}

Currently I'm helping teams to migrate from chaos to Bower :)

If you have any questions or need help - I'm here :)


Friday, February 13, 2015

Setting up a private NPM repository in an offline network

The organization I work in is disconnected from the internet.Yes, it is terrible. That's the reason I came up with this.

Effective web development requires good tools. Tools to help you lint, precompile css, test and run customized tasks. These tools exist and the comfortable way of doing this is using npm (Node Package Manager - comes bundled with nodejs). It allows you to easily install such tools.

Many developers are not familiar with these tools, and it is my job to acquaint them with better methods of developing software. One thing I learned is that the best chance to ease people into starting using a new technology is showing them that it is easy to use. And using these tools without simply running npm install karma is hard and annoying. The problem is, of course, npm install karma will not work for us because we are not connected to the internet and the central npm registry is not available for us.

So I looked for a way to make our own private npm registry. Like we have our own Nexus repository for Java artifacts.
I found several solutions.
There was one that required setting up a couch base instance. Too complicated.
I tried using npm_lazy and reggie but ran into too many bugs, and honestly these are not serious projects and it doesn't seem they will be maintained a lot longer.
Finally I stumbled upon a package sinopia, which seemed serious comparing to the former libs.

Sinopia now runs and workds and some teams are already using it. Still there are some small issues that I had to fix in the beginning.

So if you have a network behind a firewall and you need a private npm registry - use Sinopia! It's awesome! :)

Wednesday, November 12, 2014

When you need to be synchronous with NodeJS

NodeJS is awesome. Its asynchronous nature is a great fit for many applications.
But sometimes you want to do something synchronously, without callbacks.

An example for such a need is a little script I was working on lately. The script is supposed to download files from some web services and save them to the disk.
This kind of task doesn't fit into the async model. Many requests were sent in a short time and I supplied a callback to wait for the responses. Each request is an open socket and my machine would throw an error that there are too many open.

There are 3 possible solutions (that I see currently):
1. Restrict the number of open requests to N. When a request is done, its callback would notify the restricting resource and a new request that is waiting would take its place.
2. Just make it synchronous. No callbacks.
3. Use promises. A very elegant solution, but requires libraries to work.

I took the 2nd option and in my case, I had to find an http library for node that allows synchronous methods. So I used urllib-sync. This went well and it solved my problems.

In a different case, I didn't have the privilege of a nice library that will support synchronous methods. I needed to use Git from Node, and the libraries I checked out only allowed async methods.

So I needed to work around this and have the callback to call the next request.

This is the basic idea:

function nextAsyncCall() {
    asyncMethod(function (err, result) {
        // Some logic maybe..
        nextAsyncCall();
    });
}

nextAsyncCall();

This basic skeleton is completely synchronous. The next action will be performed only after the previous one is complete.

So if you are sure you need synchronous behavior - first look for a library that can help you (if you need a sync version of the X library it's usually X-sync or something similar). If there is no such library you can see which of the two solutions above fits you (pool or simply sync).

Thursday, October 30, 2014

An offline StackOverflow clone

My current organization operates in private networks (no connectivity with the internet AT ALL). Beyond the regular arguments of "I don't have Facebook!" / "I can't read the news every 5 minutes!", there are some other, more serious problems: developing software is really hard with no access to the internet. Just think about it: when you write code, how many times a day do you search for info or problem solutions on the web? My guess is A LOT.

Developers in my organization are struggling with this issue, and iv'e seen the pain in their eyes when they are forced to look for an unoccupied internet computer. They have one internet computer per team, at best, and these too loose connectivity from time to time.

This issue is a major productivity killer, to which no one seriously addressed before. So, a few months ago I woke up in the morning and thought to myself: "Why shouldn't I bring the internet to them?". I figured that the most cost-effective thing to do was to bring some kind of a clone of StackOverflow into the network. This is a single source of data being used a lot by every developer.

Luckily, As it turns out, StackOverflow publishes its data as XMLs, every 3 months! It was a real pain getting the data in (it's 14GB compressed), but I finally got it into the network.
Now, I could get the data, but I was still missing a GUI to display the data. I can't just download StackOverflow's site.

So I built a nice little GUI using Play Framework 2, AngularJS and Twitter Bootstrap. That took about a week (maybe some day I will publish it, although it's not hard to build yourself).

I still had to find a solution for the data storage. The final architecture of the app was a web interface talking directly to a Elasticsearch node holding all the data. Getting the data in was not too fun - I wrote some Python scripts that took the XMLs, transformed them to JSONs (since Elasticsearch is a JSON document storage), and sent them (using cURL) to the Elasticsearch node. The uploading process took awhile, because of the large data volumes.

Currently, the application is running for several months and my organization has slightly happier developers (~600). :)

The project (named XXXOverflow - XXX being the name of the organization) apparently inspired some other developers that suggested all kind of interesting ideas for the application. In the future we plan to expand the searching sources of the application, and make it a highly customized little Google for the devs in my organization.

Other organizations, which are in the same position (disconnected from the internet), have asked me to give them the code of the app and help them implement it in their own networks.

Some technical notes
Elasticsearch is an open source search engine solution. I used it to store the data for the app. ES is really great and it made my life so much easier. It's default search algorithm searches through 70GB of textual data with a split of a second, which is pretty amazing to me. Although when I tried to customize the ranking algorithm using their Query DSL, it really slowed down the search speed (I used ES 1.0.0).
Also, I had (and still have) issues with failing shards. It's probably the amount of data, but occasional searches just bring down shards. And not too seldom. I hope these issues will be addressed in future releases.

Tuesday, August 19, 2014

Cesium in action

Cesium is a WebGL virtual globe and map engine.
You can use it to build time-aware GIS applications. What do I mean by "time-aware"?
Suppose you want to let the user playback scenarios on the map. For instance, meteor hits over the years or satellite/airplane tracking.
Most map engines don't "understand" time. The way to implement this kind of functionality using ArcGIS for JS for example, is using different layers for each discrete moment. This is very heavy, not easy to implement and shows only discrete moments.
Cesium, on the other hand, supports continuous playback (and even comes with a clock and timeline built-in!). Cesium offers a data format, called CZML, which is basically a regular JSON array. In this array, you can specify elements. These elements could be points, polylines, polygons, text labels, images or even 3D models. For each element you can specify properties that will determine the element's life span, position, size, color, texture and a bunch of other things.
For example, you could say that a blue point with a black border will appear for 5 seconds at 2014-05-05 18:00:00 at longitude 34, latitude 35, height 0. This is represented as the following CZML:


Seems simple, right? it is. CZML provides a way to paint a scenario for Cesium to play, and you can do amazing things with it, just look at the Cesium Samples Page.
The rest of this post is some issues and best practices with Cesium.
I used Cesium for the past few months to build a small analysis tool for some clients. They can load Excel files that specify times and coordinates of events. The application will allow playback of those events and some advanced processing of those events.
Our application needed to create CZMLs on the fly, so we wrapped CZML creation in a nice small JS API. It is bad to just create CZMLs with actual JSON. This could cause code duplication and performance issues.
Beware of interpolation! Cesium supports interpolation of some properties, which means that if the color of the point is yellow at 5PM and green at 6PM, when playing back you will see the color changes gradually from yellow to green, instead of changing momentarily at 6PM. That is awesome, sometimes. For us it was mostly annoying, because we didn't need it usually, so we had to do an annoying work-around of keeping the same color 5 seconds before the changing time. That way the interpolation would happen only during those 5 seconds, which is short enough to be unnoticeable.
Interpolation sometimes causes acute performance issues. We ran into such a problem in the evening, while loading the application with real data, which was large enough to make the playback hideously slow and then the engine would collapse and Cesium would throw a weird error and just stop working. We spent 5 hours fixing this problem... It turned out to be interpolation again. Luckily, we found out that the show property wasn't interpolatable and we used it to show a point in specific times, instead of changing the alpha of the color to make it disappear.
Also, we noticed that using the show property sometimes causes performance issues, if you put more than one interval in the show array. The workaround is to tell it when not to appear (negative infinity to start time 1, end time 1 to start time 2, end time 2 to positive infinity). Weird but works :)
Loading a Cesium data source is possibly a heavy action, because it reads and processes your CZML. If you have several data sources that gets updated, use multiple data sources for a viewer (yes, Cesium supports that: viewer.dataSources) and load only the relevant ones. We used to load all data sources, even when just one data source gets updated. Looking for performance bottlenecks in our application, we noticed this inefficiency. So we made this process modular, loading only dirty data sources.

Hope this helps someone!

Sunday, June 29, 2014

FoodBetter - a simplistic recipe management application using MeteorJS

Recently, I decided to look into MeteorJS - an open source, full-stack web framework, that wields Javascript, mongoDB and NodeJS to create a simple way for creating reactive web applications.
Like all things in life, to really get to know something, it's not enough to just read and talk about it - you have to get dirty and actually use it for something. So I built a simple application for something that I was missing in my personal life - a recipe management app.
The app is here, and the code is in here.
The app was built for learning purposes and probably has bugs. It is, like said, very simple, and I will love to hear you ideas and feature requests here. You can fork it or even send me pull requests :)

What is awesome about MeteorJS?

  • Meteor supports 3-way binding. This means that when a client changes some data in his browser, another client that looks at the same data in a different computer sees the change immediately. You get that for free, no extra infrastructure code required so you can focus on your business logic. This is pretty amazing in my eyes. This feature makes Meteor an ideal choice if you need to implement a real time game or something like Google docs collaborative editing.
  • You can deploy to Meteor's test servers with a single command. That's right - you can just create a new application (meteor create myApp) and immediately publish it to the world (meteor deploy myapp.meteor.com). This sends your application to Meteor's test servers and makes it available for everyone to use, free of charge. Of course that if you are creating something real you should spend a few dollars and host it somewhere. Because it all runs on NodeJS, you can package the app with meteor bundle and publish it through heroku or nodejitsu. Anyway, the meteor deploy option is great if you want to get your app up and running in no time.
  • Javascript everywhere. You write JS in the server-side too, making it very simple to transfer and manipulate data.
Being a framework, Meteor takes away some of the control that you usually have. For instance, you don't use the script tag anymore, since Meteor will just load all files that reside in certain directories. Maybe it's ok, but you still have to learn about Meteor's loading process. To use Bower, for instance, you have to install a special Meteor package..

Meteor's documentation and community seems really great, and I will continue to follow MeteorJS, and develop FoodBetter with it. 


Monday, June 9, 2014

index in #each expression (Meteor & Spacebars)

Suppose you want to write an #each expression in a Meteor.js template. And suppose that you want to print the index of an item. Something like:

{{#each steps}}
    {{index}} : {{stepDescription}}
{{/each}}

This is a legitimate need, and even available in Handlebars, using {{@index}}, as you can see in this stackoverflow thread or this issue.

Meteor does not support this option yet (0.8).  They even mentioned this issue in their wiki:
Syntax extensions. Handlebars syntax is extremely minimal, and we foresee adding some additional well-chosen extensions over time. (We will also implement the top features of current Handlebars that are missing from Meteor, like #each that supports objects and lets you access the current index or key.)
This looks promising. But for people who need a solution right now, you can just create a helper that adds an _index property to each item in the array: