node.js - Mongodb, can i trigger secondary replication only at the given time or manually? -


i'm not mongodb expert, i'm little unsure server setup now.

i have single instance running mongo3.0.2 wiredtiger, accepting both read , write ops. collects logs client, write load decent. once day want process logs , calculate metrics using aggregation framework, data set process logs last month , calculation takes 5-6 hours. i'm thinking splitting write , read avoid locks on collections (server continues write logs while i'm reading, newly written logs may match queries, can skip them, because don't need 100% accuracy).

in other words, want make setup secondary read, replication not performing continuously, starts in configured time or better triggered before read operations started.

i'm making processing node.js 1 option see here export data created in period [yesterday, today] , import read instance myself , make calculations after import done. looking on replica set , master/slave replication possible setups didn't how config achieve described scenario. maybe wrong , miss here? there other options achieve this?

your idea of using replica-set flawed several reasons.

first, replica-set replicates whole mongod instance. can't enable individual collections, , not specific documents of collection.

second, deactivating replication , enabling before start report generation not idea either. when enable replication, new slave not up-to-date. take while until has processed changes since last contact master. there no way tell how long take (you can check how far secondary behind primary using rs.status() , comparing secondaries optimedate lastheartbeat date).

but when want perform data-mining on subset of documents selected timespan, there solution.

transfer documents want analyze new collection. can aggregation pipeline consisting of $match matches documents last month followed $out. out-operator specifies results of aggregation not sent application/shell, instead written new collection (which automatically emptied before happens). can perform reporting on new collection without locking actual one. has advantage operating on smaller collection, queries faster, can't use indexes. also, data won't change between aggregations, reports won't have inconsistencies between them due data changing between them.

when need second server report generation, can still use replication , perform aggregation on secondary. however, recommend build proper replica-set (consisting of primary, secondary , arbiter) , leave replication active @ times. not make sure data isn't outdated when generate reports, gives important benefit of automatic failover should primary go down reason.


Comments

Popular posts from this blog

java - Custom OutputStreamAppender not run: LOGBACK: No context given for <MYAPPENDER> -

java - UML - How would you draw a try catch in a sequence diagram? -

c++ - No viable overloaded operator for references a map -