Interview Note - Union Find

Feb 28 2017 (Feb 28 2017) English>Interview 5 minutes read (About 781 words)

Union Find

From wiki:
In computer science, a disjoint-set data structure, also called a union–find data structure or merge–find set, is a data structure that keeps track of a set of elements partitioned into a number of disjoint (nonoverlapping) subsets. It supports two useful operations:
Find: Determine which subset a particular element is in. Find typically returns an item from this set that serves as its “representative”; by comparing the result of two Find operations, one can determine whether two elements are in the same subset.
Union: Join two subsets into a single subset.

Example:
261. Graph Valid Tree
Given n nodes labeled from 0 to n - 1 and a list of undirected edges (each edge is a pair of nodes), write a function to check whether these edges make up a valid tree.

For example:
Given n = 5 and edges = [[0, 1], [0, 2], [0, 3], [1, 4]], return true.
Given n = 5 and edges = [[0, 1], [1, 2], [2, 3], [1, 3], [1, 4]], return false.

Hint:
Given n = 5 and edges = [[0, 1], [1, 2], [3, 4]], what should your return? Is this case a valid tree?
According to the definition of tree on Wikipedia: “a tree is an undirected graph in which any two vertices are connected by exactly one path. In other words, any connected graph without simple cycles is a tree.”

Note: you can assume that no duplicate edges will appear in edges. Since all edges are undirected, [0, 1] is the same as [1, 0] and thus will not appear together in edges.

public class Solution {
    public boolean validTree(int n, int[][] edges) {
        UnionFind uf = new UnionFind(n);
        for (int i = 0; i < edges.length; i++) {
            if (!uf.union(edges[i][0], edges[i][1])) {
                return false;
            }
        }
        return uf.size == 1;
    }
    
    private class UnionFind {
        int size;
        int[] nodes;
        
        UnionFind(int size) {
            this.size = size;
            this.nodes = new int[size];
            for (int i = 0; i < size; i++) {
                nodes[i] = i;
            }
        }
        
        boolean union(int a, int b) {
            int label_a = nodes[a];
            int label_b = nodes[b];
            if (label_a == label_b) {
                return false;
            } else {
                for (int i = 0; i < nodes.length; i++) {
                    if (nodes[i] == label_a) {
                        nodes[i] = label_b;
                    }
                }
                size--;
                return true;
            }
        }
    }
}

PowerMockito -- Mock and Spy

Mar 29 2016 (Mar 29 2016) English>Technology>Mockito 3 minutes read (About 450 words)

A bad system design can lead to much hard work. In order to increase the unit tests coverage, I recently started to work on writing unit tests for some classes. One of the case is I want to test a method as follow:

public final ReturnType getMethod (SomeRequest someRequest) {
    AnotherRequest anotherRequest = new AnotherRequest(someRequest);
    SomeResponse someResponse = SomeService.getInstance().someMethod(anotherRequest);
    SomeValue someValue = someResponse.getValue();
    /**
     * Some processes with someValue..
     */
}

The main purpose of this test is testing the process with someValue, so I should just mock the .getValue() method. But the thing is not that easy. Let me put more related classes here:
SomeService.class

public final class SomeService {
    private static SomeService instance = new SomeService();
    static {
        // A static block
    }
    protected SomeService() {}
    public static SomeService getInstance() {
        return instance;
    }
    public SomeResponse someMethod(AnotherRequest anotherRequest) {
        // Implementation of the method..
    }
}

SomeResponse.class

public class SomeResponse {
    public SomeValue getValue() {
        // Implementation of getValue()
    }
}

SomeValue.class

public class SomeValue {
    private String name;
    private void populateValue(PreDefinedType preDefinedType) {
        // Generate name from a preDefinedType, basically a black box.
    }
    public String getName() {
        return name;
    }
}

Memory Leak in Docker Container

Mar 9 2016 (Mar 9 2016) English>Technology>Docker 3 minutes read (About 476 words)

Since we recently start using docker for product release, I’m dealing with a lot of problems related with docker. One of the headache is sometimes, the running application performs different in docker containers with the one directly runs in local without docker. For me, a more specific example is my webApp, running in gradle with Jetty9 plugins, can finish our load test perfectly only on my ubuntu machine. But it will get memory leak error when runs in docker container. So I decided to look into it by Java profiler.

It’s quite easy to use a GUI profiler to monitor the whole process how an application uses CPU and memory during running time. But it’s a little complicated to attach it in a container. Firstly I tried to use YourKit by its remote profiling option. The basic steps are:

Modify Dockerfile to get YourKit installed.

RUN apt-get update
RUN apt-get install -y wget
RUN wget https://www.yourkit.com/download/yjp-2016.02-b29-linux.tar.bz2
RUN tar xfj yjp-2016.02-b29-linux.tar.bz2

"-agentpath:/usr/share/jetty/yjp-2016.02/bin/linux-x86-64/libyjpagent.so=disablestacktelemetry,disableexceptiontelemetry,delay=10000,sessionname=Jetty", \

EXPOSE 10001

After building the image, access into the docker container and attach to JVM:

1 2	docker exec -i -t <containerID> /bin/bash <directory with unpacked content>/bin/yjp.sh -attach

In YourKit GUI, link to the host:port.
But every time when I try to attach to JVM (step 2), I get error below:
1
2
3
com.sun.tools.attach.AttachNotSupportedException:
Unable to open socket file:
target process not responding or HotSpot VM not loaded.
It should be solved by just switching to the user created the container. But it didn’t work. So I tried to use JProfiler.
Here are the steps:

Modify the Dockerfile to get JProfiler installed:

RUN \
  apt-get update && \
  apt-get install -y wget && \
  wget http://download-keycdn.ej-technologies.com/jprofiler/jprofiler_linux_9_1_1.tar.gz && \
  tar -xzf jprofiler_linux_9_1_1.tar.gz -C /usr/local

ENV JPAGENT_PATH="-agentpath:/usr/local/jprofiler9/bin/linux-x64/libjprofilerti.so=nowait"
EXPOSE 8849

Be careful of the version. The one installed in docker container should be the same version as the one you use to monitor.

Execute the command in docker container:
1
2
docker exec -i -t <containerID> /bin/bash
/usr/local/jprofiler9/bin/jpenable
That’s it. Now we can link to the remote from anywhere by host:port. The default port is 8849.

But the sad story is after looking into the result of the profiler, I still don’t know how it got memory leaked. Here is a discussion about the similar docker problem, maybe I’m not the only one.
I’ll update once I figured out…

Update Hexo 3.0

Apr 30 2015 (Apr 30 2015) English>Technology>Hexo 4 minutes read (About 589 words)

Spend a whole night working on updating Hexo. It is much more complicated than I expected. For anyone’s convenience, I record some key points when migrate from Hexo 2.x to 3.0 besides the official documentation.

First things first: uninstall Hexo

Yes. Though the official wiki says that we can directly migrate from 2.x to 3.0, this doesn’t work (to me). Besides, instead of wasting time figuring out what’s wrong when updating, re-installating and configurating Hexo 3.0 seems acceptable to me. You can try to update directly but I didn’t do that successfully.
Before your uninstallation, backup your posts and any other custom setting files (eg. _config.yml, your own theme file, etc).
To uninstall Hexo, use the node package manager(npm):

1	npm uninstall hexo

Now, install Hexo 3.0

Run

1	$ sudo npm install -g hexo-cli

Hexo wiki doesn’t mention the administration permission, but in my experience we have to add sudo here.

Then we need to setup hexo folder in your desired place(folder).

1
2
3

$ hexo init <folder>
$ cd <folder>
$ sudo npm install

Still, try sudo when you fail to run any command.

For your future convenience, we install all necessary services here:

npm install hexo-generator-index --save
npm install hexo-generator-archive --save
npm install hexo-generator-category --save
npm install hexo-generator-tag --save
npm install hexo-server --save
npm install hexo-deployer-git --save
npm install hexo-deployer-heroku --save
npm install hexo-deployer-rsync --save
npm install hexo-deployer-openshift --save

and any plugins you probably need:

npm install hexo-renderer-marked@0.2 --save
npm install hexo-renderer-stylus@0.2 --save
npm install hexo-generator-feed@1 --save
npm install hexo-generator-sitemap@1 --save

All in sudo mode (or you can directly do all of these in sudo -i mode).
Now, we finished installation.

Character Encoding - ASCII, Unicode & UTF-8

Mar 18 2015 (Mar 18 2015) English>Technology>Language Processing 4 minutes read (About 647 words)

Introduction

ASCII

Single byte encoding only using the bottom 7 bits(0-127). The top bit is always 0.
ASCII Chart
In English, 128 symbols are enough to represent all character. But in other situations, French for example, they are insufficient. So we use the top bit to represent accents so that there are up to 256 characters. ‘é’ encodes as 1000 0010(130).
But here comes another problem. In different languages, the same binary encoding represents different characters, such as 130 in French is é, but in Hebrew, it is Gimel(ג). Not to mention Chinese characters(more than 100 thousand). So we introduce another encoding system, unicode.

Unicode

“Unicode encoding” is more properly known as UTF-16: 2 bytes per “code point”. This is the native format of strings in .NET. Values outside the Basic Multilingual Plane(BMP) are encoded as surrogate pairs. These are relatively rarely used - which is a good job, as very few developers get them right, I suspect. “Unicode” is really the character set - it is unfortunate that the term is also used as a synonym for UTF-16 in .NET and various Windows applications.
Unicode can be implemented by different character encodings. The most commonly used encodings are UTF-8, UTF-7 and UTF-32:
UTF-8: Variable length encoding, 1-4 bytes covers every current character. ASCII values are encoded as ASCII.
UTF-7: Usually used for mail encoding. Chances are if you think you need it and you’re not doing mail, you’re wrong. (not widely used at all.)
UTF-32: Fixed width encoding using 4 bytes per code point. This isn’t very efficient, but makes life easier outside the BMP.

UTF-8

UTF-8 has become the dominant character encoding for the World Wide Web. The rule of UTF-8 is:
(1) If this is in ASCII, UTF-8 is the same with ASCII
(2) For n UTF bytes(n > 1), the first n bits in the first byte set as 1, the n + 1 bit sets as 0, all the first two bits in the following bytes are all 10. The rest of the bits are represented as the unicode of the character.

UTF Bytes	Hexadecimal	Binary
1	0000 0000 to 0000 007F	0xxxxxxx
2	0000 0080 to 0000 07FF	110xxxxx 10xxxxxx
3	0000 0800 to 0000 FFFF	1110xxxx 10xxxxxx 10xxxxxx
4	0001 0000 to 0010 FFFF	11110xxx 10xxxxxx 10xxxxxx 10xxxxxx

For example, the unicode of “严” is 4E25 (100111000100101). According to the table above, this character belongs to the third row. So its UTF-8 is E4B8A5 (11100100 10111000 10100101).
We can see that actually unicode is different with UTF-8. But there are some libraries that can do the convert.

English

Interview Note - Union Find

Union Find

PowerMockito -- Mock and Spy

Memory Leak in Docker Container

Update Hexo 3.0

First things first: uninstall Hexo

Now, install Hexo 3.0

Character Encoding - ASCII, Unicode & UTF-8

Introduction

ASCII

Unicode

UTF-8

Your browser is out-of-date!