In this project for CS 498 Data Visualization, I chose to analyze the top 1000 highest grossing movies of all time to create a narrative visualization to show which major studio parents have generally had the highest success at the box office. The major studio parents are NBCUniversal, ViacomCBS, WarnerMedia, Walt Disney Studios, and Sony Pictures. Along with the major studios, mini-major studios such as Lionsgate, The Amblin Group, STX Films, and MGM were clumped into one category as they are smaller studios looking to compete with major studios. If a movie did not fit into either a major or the mini-major category, the film was placed in Other. The overall structure of the narrative visualization is an interactive slideshow that is broken up into the following four scenes: a bar chart that shows films that have grossed over a billion dollars, a bar chart to show how well the studio labels have performed against each other, A donut chart to show the best performing studio and how much their child studios make up their overall success, and a circle packing chart to show the best performing directors for the best performing studio. The data was taken from SaiVamshiAtukuri's dataset "Collections of top grossing movies" on Kaggle, and the link for the dataset is the following: https://www.kaggle.com/saivamshi/collections-of-top-grossing-movies/version/2/. Once the data was collected, I ran the data through a python script to join the different CSV files into a single CSV file to use, called movies.csv. In movies.csv, there are 1000 records for the number of movies in the dataset, and the columns for the file are the following: Rank, Movie_ID, Movie_Name, Director, Year, US_Distributor, Lifetime_Gross, Budget, MPAA, Running_Time, and Genre.
The first scene of the Interactive Slideshow Narrative Visualization is a bar chart of films in the top 1000 grossing that have reached at least one billion dollars in gross revenue. This distinction was chosen as modern-day movie studios hope to produce movies that have reached over a billion dollars in Total Lifetime Gross as it is an indicator of both domestic and international success for a film. This is due to the fact that international revenue has risen to account for about double the percentage as domestic, with the gap continuing to widen (https://stephenfollows.com/important-international-box-office-hollywood/). Moreover, the distinction was chosen as it helped break the data down into a smaller dataset to analyze (46 movies against 1000 movies). The x-axis shows each movie and the y-axis shows the Lifetime Gross for that movie. The Legend shows the different colors used for the different movie groupings. The major studio groupings are NBCUniversal, ViacomCBS, WarnerMedia, Walt Disney Studios, and Sony Pictures followed by movies produced by Mini-major studios and Other movies. The chart also allows movies to be highlighted through mouse events and a tooltip to show data pertaining to the film. From the interactive data, it can be seen that more of the movies towards the higher end were produced by studios owned by Walt Disney Studios.