Detecting the game being played on a live Twitch stream

This is part 3 in a series on detecting games on Twitch. You can read Part 1 and Part 2.

Now that we can detect a game pretty reliably and fast, the next step is to see if we can apply this technique to a live stream. There are a few things that need to happen in order to get video stream, get the stream, extract the frame and predict. Since video is just a series of images, we can run a prediction on each image and possibly get a more accurate over all result.

The whole lifecycle from video to prediction:

  1. Get stream playlist from Twitch
  2. Download each video from playlist
  3. Concatenate into a single video clip
  4. Extract one screen per 60 frames (every 2s or so depending on the stream)
    1. Extract the histogram of the screen
    2. Run the historgram through our prediction Model
  5. Count up what game was predicted the most and we have our result

There is one additional gem needed to accomplish this and its the ruby-ffmpeg. This is only available via the Github repo (it's not in rubygems) and you'll need ffmpeg installed.

require 'rest-client'
require 'ruby-ffmpeg'
require 'stringio'
require 'opencv'
require 'json'
require 'liblinear'

$model ="games.model")
games = {1 => "csgo", 2 => "dota2", 3 => "hearthstone", 4 => "minecraft", 5 => "starcraft"}

def histogram(data)
  iplimg = OpenCV::IplImage.decode_image(data)
  b, g, r = iplimg.split

  dim = 3
  sizes = [8,8,8]
  ranges = [[0, 255],[0, 255],[0, 255]]
  hist =, sizes, OpenCV::CV_HIST_ARRAY, ranges, true)
  hist.calc_hist([r, g, b])

url = ARGV[0]
channel = url.split("/")[-1]

token = JSON.parse(RestClient.get("{channel}/access_token?as3=t"))

playlist = RestClient.get("{channel}.m3u8?sig=#{token["sig"]}&token=#{token["token"]}")

url = playlist.split("\n")[4]
base = url.split("py-index")[0]
prediction = {}
video = ""

parts = RestClient.get(url)
parts.split("\n").select{|x| x[0] != "#"}.each do |p|
  u = "#{base}#{p}"

  puts "Downloading #{u}"
  resp = RestClient.get(u)
  video = video + resp.body

cnt = 0 do |reader|
  stream = { |s| s.type == :video }.first

  while frame = stream.decode do
    if cnt % 60 == 0
      puts "Predicting frame @ #{frame.timestamp}"

      h = histogram(frame.to_bmp)
      vals = []

      (0..511).each do |i|
        vals << h[i]

      res = $model.predict(vals)
      if prediction[res]
        prediction[res] += 1
        prediction[res] = 1

    cnt += 1

sorted = prediction.sort_by {|_key, value| value}.reverse

puts "\nDetected: #{games[sorted.first[0].to_i]}"

puts "\nPrediction results:"

sorted.each do |k, v|
  puts "#{games[k.to_i]}: #{v}"

To try this on any stream, just pass in a TwitchTV url in the command line. This is a CS:GO stream.

$ ruby stream.rb
Predicting frame @ 536037.03
Predicting frame @ 536127.03
Predicting frame @ 536216.985
Predicting frame @ 536306.985
Predicting frame @ 536397.03
Predicting frame @ 536487.03
Predicting frame @ 536576.265
Predicting frame @ 536667.03
Predicting frame @ 536756.265
Predicting frame @ 536846.985
Predicting frame @ 536936.265

Detected: csgo

Prediction results:
csgo: 8
minecraft: 2
hearthstone: 1

For a Hearthstone stream:

$ ruby stream.rb
Predicting frame @ 114013.44
Predicting frame @ 114103.44
Predicting frame @ 114193.44
Predicting frame @ 114283.44
Predicting frame @ 114373.44
Predicting frame @ 114463.44

Detected: hearthstone

Prediction results:
hearthstone: 6

Of course just like with the previous part 1 and part 2, results can vary. Especially if the streamer's feed isn't on the game or is too distorted with other stuff on the screen. The advantage of doing this with a video stream is that we can have many more predictions so having a couple false positives in the mix wont throw off the entire result.

The entire codebase with the dataset for this experiment can be found in the Github repo.