permanent storage alternative - TWiki

Download Report

Transcript permanent storage alternative - TWiki

ALMA Integrated Computing Team
Coordination & Planning Meeting #1
Santiago, 17-19 April 2013
Evaluation of mongoDB for Persistent
Storage of Monitoring Data
Tzu-Chiang Shen
Leonel Peña
Monitoring Storage Requirement
 Expected data rate with 66 antennas:
 ~ 6000 - 7000 clobs/s ~ 25 - 30 GB/day
 ~ equivalent to 310KByte/s or 2,485Mbit/s
 ~ 130,000 - 150,000 monitor points
 Monitoring data characteristic
 Simple data structure: [timestamp, value]
 But huge amount of data
 Read-only data
 Data is sorted at the moment of insertion
ICT-CPM1 17-19 April 2013
Very Brief Introduction of
MongoDB
 no-SQL and document oriented.
 The storage format is BSON, a variation of JSON.
SQL
mongoDB
Database
Database
Table
Collection
Row
Document
Field
Field
Index
Index
 A document within a collection, doesn’t required to have the
same fields.
 Other features: Sharding, Replication, Aggregation
(Map/Reduce)
ICT-CPM1 17-19 April 2013
Very Brief Introduction of
MongoDB …
A document in mongoDB:
{
_id: ObjectID("509a8fb2f3f4948bd2f983a0"),
user_id: "abc123",
age: 55,
status: 'A'
}
ICT-CPM1 17-19 April 2013
Alternatives of Schema for
Monitoring Data
 One monitoring point per document
ICT-CPM1 17-19 April 2013
Alternatives of Schema …
 A clob per document
ICT-CPM1 17-19 April 2013
Alternatives of Schema …
 A monitor point per day per document
ICT-CPM1 17-19 April 2013
Analysis
 Advantages:
 The amount of documents within a collection is bounded
• There will be ~150,000 documents per day
• The amount of indexes will be bounded as well.
 No data fragmentation problem
 Once a specific document is identified ( nlog(n) ), the
access to a specific range or a single value can be done
in O(1)
 Smaller ratio of metadata / data
ICT-CPM1 17-19 April 2013
How would a query look
like?
 Query to retrieve a value with seconds-level
granularity:
 Ej: To get the value of the
FrontEnd/Cryostat/GATE_VALVE_STATE at 2012-0915T15:29:18.
db.monitorData_[MONTH].findOne(
{"metadata.date": "2012-9-15",
"metadata.monitorPoint": "GATE_VALVE_STATE",
"metadata.antenna": "DV10",
"metadata.component": "FrontEnd/Cryostat”},
{ 'hourly.15.29.18': 1 }
);
ICT-CPM1 17-19 April 2013
How would a query looks
like …
 Query to retrieve a range of value
 Ej: To get values of the
FrontEnd/Cryostat/GATE_VALVE_STATE at minute 29
(at 2012-09-15T15:29)
db.monitorData_[MONTH].findOne(
{"metadata.date": "2012-9-15",
"metadata.monitorPoint": "GATE_VALVE_STATE",
"metadata.antenna": "DV10",
"metadata.component": "FrontEnd/Cryostat”},
{ 'hourly.15.29': 1 }
);
ICT-CPM1 17-19 April 2013
Indexes
 A typical query is restricted by:
 Antenna name
 Component name
 Monitor point
 Date
db.monitorData_[MONTH].ensureIndex(
{ "metadata.antenna": 1, "metadata.component": 1,
"metadata.monitorPoint": 1, "metadata.date": 1
}
);
ICT-CPM1 17-19 April 2013
Testing Hardware / Software
 A cluster of two nodes were created
 CPU: Intel Xeon Quad core X5410.
 RAM: 16 GByte
 SWAP: 16 GByte
 OS:
 RHEL 6.0
 2.6.32-279.14.1.el6.x86_64
 MongoDB
 V2.2.1
ICT-CPM1 17-19 April 2013
Testing Data
 Real data from from Sep-Nov of 2012 was used initially,
but:
 A tool to generate random data was implemented:









Month: 1 (February)
Number of days: 11
Number of antennas: 70
Number of components by antenna: 41
Monitoring points by component: 35
Total daily documents: 100.450
Total of documents: 1.104.950
Average weight by document: 1,3MB
Size of the collection: 1,375.23GB
 Total index size 193MB
ICT-CPM1 17-19 April 2013
Database Statistics
ICT-CPM1 17-19 April 2013
Data Sets
ICT-CPM1 17-19 April 2013
Data Sets …
ICT-CPM1 17-19 April 2013
Data Sets
ICT-CPM1 17-19 April 2013
Schema 1: One Sample of
Monitoring Data per Document
ICT-CPM1 17-19 April 2013
Proposed Schema:
ICT-CPM1 17-19 April 2013
More tests
 For more tests, see
https://adcwiki.alma.cl/bin/view/Software/HighVolu
meDataTestingUsingMongoDB
ICT-CPM1 17-19 April 2013
Pending
 Test performance of aggregations/combined
queries
 Use Map/Reduce to create statistics (max, min,
avg, etc) of range of data to improve performance
of queries like:
 i.e: Search monitoring points which values >= 10
 Test performance under a years worth of data
 Stress tests with big amount of concurrent queries
ICT-CPM1 17-19 April 2013
Conclusion
 MongoDB is suitable as an alternative for
permanent storage of monitoring data
 The schema + indexes are fundamental to achieve
milliseconds level of responses
ICT-CPM1 17-19 April 2013