Stock Trading

by Scala, Ruby and Mongo

The idea

RU05 is a group of 3 HKUST CS mates.

in Jan 2010, we wanted to build something:

  • for fun
  • for money
  • within short time

The idea

The program is simple in high level

  • download data
  • analyze data
  • generate results
  • simulation
  • UI / command line for all the above

What am I NOT going to talk about …

  • the secret part
  • how i beat the market

What am i going to talk about …

  • architecture overview
  • tools i used
  • Mongo model and memory
  • Mongo performance tuning tips

Architecture

Our First Prototype

Hardware

Home Desktop

OS

Windows

Software

C, Ruby and plain text

After Revamp - Hardware

Production

Ubuntu 2G Ram 1.6GHz CPU

Developement

Windows and Macbooks

Database

Mongo (32 bits and 64 bits)

After Revamp - Software

Crawler

Ruby, Mechanize

Analyzer

Scala, Swing, Scaml, Mongo (thru casbah)

Web UI

Rails, jquery, datatable, Mongo (thru Mongoid)

Chart

jQuery, jqplot

Mongo

Model and Memory

Mongo: Embed vs Relational

Embed (de-normalized)


company model:
{
  _id: "897132", name: "Thought Sauce", 
  employee: {_id: "12345", name: "Eddie" }
}

Relational (normalized)


company model:
{
  _id: "897132", name: "Thought Sauce"
}

employee model:
{
  _id: "12345", name: "Eddie", company_id: "897132" 
}

Mongo: Embed vs Relational

recommendation is Embed

“If performance is an issue, embed.”

but i didn’t

Mongo: Embed

Pros

  • Faster
    (if company in memory, employee also in memory)
  • Less index space

Cons

  • the single object can be too big
    (4MB limit)
  • complicated to run global queries on subdocument
    (i.e. use map reduce to grab employee)

Keep things in memory

Mongo: Memory Handling

Memory Mapped

  • virtual memory
    (OS virtual memory manager, LRU)
  • cache index and data
  • one cache
    not OS cache and database cache
  • collection tends to be stick together
  • auto grow => Memory Hungry

So, why am i using relational

  • too big to be a single stock object
    4MB
  • i don’t query across multiple collections
  • little memory overheads impact to me
    (~4KB per collection)

Performance Tuning

Measure and Tune

Mongo: Commands for measure

DEMO

  • db.setProfilingLevel(2, 100)
    see db.system.profile then use EXPLAIN
  • db.serverStatus()
  • db.stats()
  • db.collectionName.stats()
  • mongostat
    number of insert per second, amount of memory mapped

MacOSX: Commands for measure

DEMO

  • vm_stat
  • iostat 1 10
  • lsof | grep mongod | grep TCP

Tunings and Issues

http://www.mongodb.org/display/DOCS/Optimization

Tunings i did

  • compare to Redis
  • index
    (every sort and find column, be careful of the index size)
  • shorter names and data
  • query before updating
  • Capped collection (Caution of update)

Issues

  • 32-bits CPU
  • Windows

Join

codeaholics

and

hkror google group!