Author: Letícia Zorzi Rama
The goal of this project is to perform an analysis on a film database to guide PProductions in deciding which type of film should be developed next by predicting IMDb ratings.
This project is organized into two main parts, implemented in separate Jupyter notebooks:
-
Project Setup, pt. 1 & Exploratory Data Analysis (EDA)
- Notebook:
PProductions_EDA.ipynb
- Notebook:
-
Project Setup, pt. 2 & Prediction
- Notebook:
PProductions_Prediction.ipynb
- Notebook:
Focuses insights and answers to questions raised in the project.
The project uses an enriched version of the IMDb dataset:
-
IMDb Dataset
- Initial dataset:
desafio_indicium_imdb.csvcontaining general information about films.
- Initial dataset:
-
Data Augmentation
- The IMDb dataset is enriched with budget and revenue features obtained from The Movie Database (TMDb).
- The resulting dataset is named
imdb_tmdb.csvand serves as the main dataset for both exploratory analysis and prediction.
├── PProductions_EDA.ipynb # Exploratory data analysis
├── PProductions_Prediction.ipynb # IMDb rating prediction
├── Report.ipynb # Focuses insights and answers to questions raised in the project
├── desafio_indicium_imdb.csv # Initial IMDb dataset
├── imdb_tmdb.csv # Enriched dataset used in the project
├── model.pkl # Trained predictive model
├── requirements.txt # dependencies versions
└── README.md # Project documentation
Thank you for taking the time to explore this project. I hope you enjoy reviewing the analysis and insights as much as I enjoyed developing them!
💬 I’d be glad to connect and talk about projects like this — feel free to reach out!