Stratosphere's programming model provides six parallelizable operator functions
- Map
- Reduce
- Join
- Cross
- CoGroup
- Union
JTS library is a standard java library to perform geographical analysis with special data types like points, paths, polygons.
In this spatial data processing library, we are going to implement various geographical analysis such as intersections, joins with big map data in standard formats like OSM(Open Street Map)
There is some effort in here to create new data types in stratosphere which is compatible with JTS library.
Here is an example to Stratosphere Java API with WordCount example. You can simply download the stratosphere-examples and check it your own.
When you need to familiar with stratosphere, there is exercises in bigdataclass website.
You can find out those implemented exercises in java.
Word
Count example in stratosphere
We can simply says, it can takes an input file and
counts the number of the occurrences of each word in that file.
Basically it uses a mapper and a reducer.
Mapper is extended from MapFunction and reducer
is extended from ReduceFunction
Those classes(MapFunction, ReduceFunction) belong to
the API (eu.stratosphere.api)
We do not need to worry about those classes too much
because they are make our life easier. We need to implement
pre-defined functions in these classes.
Mapper
It initially gets the first field as type of String
value(java.lang.String) from the record.
Then it normalizes the string. After normalizing the
string only contains
- alphanumeric words
- whole string in lower case
eg :
1,Big Hello to Stratosphere! :-) -> 1 big hello
to stratosphere
Using a tokenizer, it splits into simple words.
Then collector collects data(each splitted word) as a
new Record.
collector.collect(new Record(new
StringValue(word), new IntValue(1)));
First field represents the word and second field
repr esents the occurrence. So each time it should be 1.
Reducer
The reducer
comes into the action to do the addition of number of occurrences in
the each word. It gets the each record in the mapper and get the
count for each word.
After get the
sum, it updates the second field of the Record with sum.
element.setField(1,
new IntValue(sum))
Example :
1 big hello to big stratosphere
Comments
Post a Comment