bigdata - Data scheme Cassandra using various data types -


currently developing solution in field of time-series data. within these data have: id, value , timestamp. here comes: value might of type boolean, float or string. consider 3 approaches:

a) every data type distinct table, sensor values of type boolean table, sensor values of type string another. obvious disadvantage have know sensor.

b) meta-column describing data type plus values of type string. obvious disadvantage data conversion e.g. calculating max, avg , on.

c) having 3 columns of different type 1 value per record. disadvantage 500000 sensors firing every 100ms ... plenty of unused space.

as knowledge limited appreciated.

500000 sensors firing every 100ms

first thing, make sure partition properly, make sure don't exceed limit of 2 billion columns per partition.

create table sensordata (   stationid uuid,   datebucket text,   recorded timeuuid,   intvalue bigint,   strvalue text,   blnvalue boolean,    primary key ((stationid,datebucket),recorded)); 

with half-million every 100ms, that's 500 million in second. you'll want set datebucket granular...down second. next i'll insert data:

 stationid                            | datebucket          | recorded                             | blnvalue | intvalue | strvalue --------------------------------------+---------------------+--------------------------------------+----------+----------+----------  8b466f1d-8d6b-46fa-9f5b-8c4eb51aa40c | 2015-04-22t14:54:29 | 6338df40-e929-11e4-88c8-21b264d4c94d |     null |       59 |     null  8b466f1d-8d6b-46fa-9f5b-8c4eb51aa40c | 2015-04-22t14:54:29 | 633e0f60-e929-11e4-88c8-21b264d4c94d |     null |     null |       cd  8b466f1d-8d6b-46fa-9f5b-8c4eb51aa40c | 2015-04-22t14:54:29 | 6342f160-e929-11e4-88c8-21b264d4c94d |     true |     null |     null  3221b1d7-13b4-40d4-b41c-8d885c63494f | 2015-04-22t14:56:19 | a48bbdf0-e929-11e4-88c8-21b264d4c94d |    false |     null |     null 

...plenty of unused space.

you might suprised. cql output of select * above, appears there null values on place. watch happens when use cassandra-cli tool view how data stored "under hood:"

rowkey: 3221b1d7-13b4-40d4-b41c-8d885c63494f:2015-04-22t14\:56\:19 => (name=a48bbdf0-e929-11e4-88c8-21b264d4c94d:, value=, timestamp=1429733297352000) => (name=a48bbdf0-e929-11e4-88c8-21b264d4c94d:blnvalue, value=00, timestamp=1429733297352000) 

as can see, data (above) stored cql row stationid=3221b1d7-13b4-40d4-b41c-8d885c63494f , datebucket='2015-04-22t14:56:19' shows blnvalue has value of 00 (false). notice intvalue , strvalue not present. cassandra doesn't force null value rdbms does.

the obvious disadvantage data conversion e.g. calculating max, avg , on.

perhaps know this, did want mention cassandra cql not contain definitions max, avg or other data aggregation function. you'll either need client-side, or implement apache-spark perform olap-type queries.

be sure read through patrick mcfadin's getting started time series data modeling. contains suggestions on how solve time series problems this.


Comments

Popular posts from this blog

java - Custom OutputStreamAppender not run: LOGBACK: No context given for <MYAPPENDER> -

java - UML - How would you draw a try catch in a sequence diagram? -

c++ - No viable overloaded operator for references a map -