permanent storage alternative - TWiki
Download
Report
Transcript permanent storage alternative - TWiki
ALMA Integrated Computing Team
Coordination & Planning Meeting #1
Santiago, 17-19 April 2013
Evaluation of mongoDB for Persistent
Storage of Monitoring Data
Tzu-Chiang Shen
Leonel Peña
Monitoring Storage Requirement
Expected data rate with 66 antennas:
~ 6000 - 7000 clobs/s ~ 25 - 30 GB/day
~ equivalent to 310KByte/s or 2,485Mbit/s
~ 130,000 - 150,000 monitor points
Monitoring data characteristic
Simple data structure: [timestamp, value]
But huge amount of data
Read-only data
Data is sorted at the moment of insertion
ICT-CPM1 17-19 April 2013
Very Brief Introduction of
MongoDB
no-SQL and document oriented.
The storage format is BSON, a variation of JSON.
SQL
mongoDB
Database
Database
Table
Collection
Row
Document
Field
Field
Index
Index
A document within a collection, doesn’t required to have the
same fields.
Other features: Sharding, Replication, Aggregation
(Map/Reduce)
ICT-CPM1 17-19 April 2013
Very Brief Introduction of
MongoDB …
A document in mongoDB:
{
_id: ObjectID("509a8fb2f3f4948bd2f983a0"),
user_id: "abc123",
age: 55,
status: 'A'
}
ICT-CPM1 17-19 April 2013
Alternatives of Schema for
Monitoring Data
One monitoring point per document
ICT-CPM1 17-19 April 2013
Alternatives of Schema …
A clob per document
ICT-CPM1 17-19 April 2013
Alternatives of Schema …
A monitor point per day per document
ICT-CPM1 17-19 April 2013
Analysis
Advantages:
The amount of documents within a collection is bounded
• There will be ~150,000 documents per day
• The amount of indexes will be bounded as well.
No data fragmentation problem
Once a specific document is identified ( nlog(n) ), the
access to a specific range or a single value can be done
in O(1)
Smaller ratio of metadata / data
ICT-CPM1 17-19 April 2013
How would a query look
like?
Query to retrieve a value with seconds-level
granularity:
Ej: To get the value of the
FrontEnd/Cryostat/GATE_VALVE_STATE at 2012-0915T15:29:18.
db.monitorData_[MONTH].findOne(
{"metadata.date": "2012-9-15",
"metadata.monitorPoint": "GATE_VALVE_STATE",
"metadata.antenna": "DV10",
"metadata.component": "FrontEnd/Cryostat”},
{ 'hourly.15.29.18': 1 }
);
ICT-CPM1 17-19 April 2013
How would a query looks
like …
Query to retrieve a range of value
Ej: To get values of the
FrontEnd/Cryostat/GATE_VALVE_STATE at minute 29
(at 2012-09-15T15:29)
db.monitorData_[MONTH].findOne(
{"metadata.date": "2012-9-15",
"metadata.monitorPoint": "GATE_VALVE_STATE",
"metadata.antenna": "DV10",
"metadata.component": "FrontEnd/Cryostat”},
{ 'hourly.15.29': 1 }
);
ICT-CPM1 17-19 April 2013
Indexes
A typical query is restricted by:
Antenna name
Component name
Monitor point
Date
db.monitorData_[MONTH].ensureIndex(
{ "metadata.antenna": 1, "metadata.component": 1,
"metadata.monitorPoint": 1, "metadata.date": 1
}
);
ICT-CPM1 17-19 April 2013
Testing Hardware / Software
A cluster of two nodes were created
CPU: Intel Xeon Quad core X5410.
RAM: 16 GByte
SWAP: 16 GByte
OS:
RHEL 6.0
2.6.32-279.14.1.el6.x86_64
MongoDB
V2.2.1
ICT-CPM1 17-19 April 2013
Testing Data
Real data from from Sep-Nov of 2012 was used initially,
but:
A tool to generate random data was implemented:
Month: 1 (February)
Number of days: 11
Number of antennas: 70
Number of components by antenna: 41
Monitoring points by component: 35
Total daily documents: 100.450
Total of documents: 1.104.950
Average weight by document: 1,3MB
Size of the collection: 1,375.23GB
Total index size 193MB
ICT-CPM1 17-19 April 2013
Database Statistics
ICT-CPM1 17-19 April 2013
Data Sets
ICT-CPM1 17-19 April 2013
Data Sets …
ICT-CPM1 17-19 April 2013
Data Sets
ICT-CPM1 17-19 April 2013
Schema 1: One Sample of
Monitoring Data per Document
ICT-CPM1 17-19 April 2013
Proposed Schema:
ICT-CPM1 17-19 April 2013
More tests
For more tests, see
https://adcwiki.alma.cl/bin/view/Software/HighVolu
meDataTestingUsingMongoDB
ICT-CPM1 17-19 April 2013
Pending
Test performance of aggregations/combined
queries
Use Map/Reduce to create statistics (max, min,
avg, etc) of range of data to improve performance
of queries like:
i.e: Search monitoring points which values >= 10
Test performance under a years worth of data
Stress tests with big amount of concurrent queries
ICT-CPM1 17-19 April 2013
Conclusion
MongoDB is suitable as an alternative for
permanent storage of monitoring data
The schema + indexes are fundamental to achieve
milliseconds level of responses
ICT-CPM1 17-19 April 2013