Stock Trading
by Scala, Ruby and Mongo
The idea
RU05 is a group of 3 HKUST CS mates.
in Jan 2010, we wanted to build something:
- for fun
- for money
- within short time
The idea
The program is simple in high level
- download data
- analyze data
- generate results
- simulation
- UI / command line for all the above
What am I NOT going to talk about …
- the secret part
- how i beat the market
What am i going to talk about …
- architecture overview
- tools i used
- Mongo model and memory
- Mongo performance tuning tips
Architecture
Our First Prototype
Hardware
Home Desktop
OS
Windows
Software
C, Ruby and plain text
After Revamp - Hardware
Production
Ubuntu 2G Ram 1.6GHz CPU
Developement
Windows and Macbooks
Database
Mongo (32 bits and 64 bits)
After Revamp - Software
Crawler
Ruby, Mechanize
Analyzer
Scala, Swing, Scaml, Mongo (thru casbah)
Web UI
Rails, jquery, datatable, Mongo (thru Mongoid)
Chart
jQuery, jqplot
Mongo
Model and Memory
Mongo: Embed vs Relational
Embed (de-normalized)
company model:
{
_id: "897132", name: "Thought Sauce",
employee: {_id: "12345", name: "Eddie" }
}
Relational (normalized)
company model:
{
_id: "897132", name: "Thought Sauce"
}
employee model:
{
_id: "12345", name: "Eddie", company_id: "897132"
}
Mongo: Embed vs Relational
recommendation is Embed
“If performance is an issue, embed.”
but i didn’t
Mongo: Embed
Pros
- Faster (if company in memory, employee also in memory)
- Less index space
Cons
- the single object can be too big (4MB limit)
- complicated to run global queries on subdocument (i.e. use map reduce to grab employee)
Keep things in memory
Mongo: Memory Handling
Memory Mapped
- virtual memory (OS virtual memory manager, LRU)
- cache index and data
- one cache not OS cache and database cache
- collection tends to be stick together
- auto grow => Memory Hungry
So, why am i using relational
- too big to be a single stock object 4MB
- i don’t query across multiple collections
- little memory overheads impact to me (~4KB per collection)
Performance Tuning
Measure and Tune
Mongo: Commands for measure
DEMO
- db.setProfilingLevel(2, 100)see db.system.profile then use EXPLAIN
- db.serverStatus()
- db.stats()
- db.collectionName.stats()
- mongostat number of insert per second, amount of memory mapped
MacOSX: Commands for measure
DEMO
- vm_stat
- iostat 1 10
- lsof | grep mongod | grep TCP
Tunings and Issues
http://www.mongodb.org/display/DOCS/Optimization
Tunings i did
- compare to Redis
- index (every sort and find column, be careful of the index size)
- shorter names and data
- query before updating
- Capped collection (Caution of update)
Issues
- 32-bits CPU
- Windows