YouTube, founded in Feb 2005, with initial army of 2 developers, 2 architects, 2 network engineers, 1 DBA, YouTube grew incredibly fast to what we know it today as.
It wasn’t just the user popularity that grew, architecture and underlying technology took serious overhauls.
YouTube is world’s 4th most popular website, and no.1 for videos, worldwide. It serves around 1 billion views and several Terabytes of data everyday which accounts to 11,574 views per second .
How can YouTube handle such a large amount of Video traffic without having a noticeable performance lag? Some of the answers are found from high scalability that talks about best industry practices.
It’s no surprise, it’s all powered by Open Source Technologies. Here is a glimpse at the Platform Architecture they deploy:
1. Web Components:
For handling changes, they employ agents that watches for changes, pre-calculates, and sends to all systems. This has become highly complex and smart, specially after the merger with google.
2. Serving Videos
Popular content is moved to a CDN (content delivery network). CDNs help a lot by replicating content on to multiple geographical areas. Using such a strategy, there is a higher probability that the content could be closer to the user, with fewer hops, and hence faster. On the other hand, less popular content (say 1-40 views per day) uses YouTube servers in various local sites, instead. There is one drawback of this — Long tail effect — This happens when lots of such videos are being played. Caching literally fails to do any good. To tackle this, Youtube tunes RAID controllers, memory on each machine so to optimize random disk access and thereby allowing higher multiple-file-access.
Note: Most of the CDNs are dedicated and internal to youtube , now.
3. Serving Thumbnails
Storing lots of files in a file system is still not a good idea. A high number of requests/sec can create havoc of a HDD, as web pages can be quantitative with upto 50 thumbnails on a page. Due to bad inode caches, apache performs badly. Using squid (reverse proxy) in front of Apache can be one solution. This worked for youtube for a while, but as load increased performance eventually decreased. Went from 300 requests/second to 20. They tried using lighttpd but with a single threaded it stalled. Multiprocesses mode caused troubles,as well, because they would each keep a separate cache.
4. Databases
Thes trategy employed splits database partitions into shards with users then assigned to different shards. As the adoption of google’s technologies into youtube grabbed acceleration, Google’s advanced caching mechanisms over MySQL clusters came into the picture which makes youtube so powerful.
Liked the Post? Subscribe to Twitter updates, or RSS, join Facebook fanpage for more Tech updates.
loading...
loading...
Great, because google knows how to make it right. 🙂
loading...
loading...
well, I`m out of words.
how did you manage to gather that much info?
loading...
loading...
i just want to say thanks for your information
loading...
loading...
Thank you for some other great article. The place else may
anybody get that kind of information in such an ideal
means of writing? I’ve a presentation subsequent week, and I
am at the search for such information.
loading...
loading...