Introduction
Lights, camera, sentiment analysis! In this captivating project, we’re diving into the world of movies and emotions, exploring how machine learning techniques can help us decipher the sentiment hidden within movie reviews. So grab your popcorn, and let’s embark on a journey through data, analysis, and a touch of AI magic.
Loading the Movie Review Dataset
Our cinematic journey begins by obtaining the movie review dataset. Using the Kaggle API, we effortlessly download the dataset and prepare for our analysis.
! pip install -q kaggle
from google.colab import files
# Upload kaggle.json API key
files.upload()
! mkdir ~/.kaggle
! cp kaggle.json ~/.kaggle/
! chmod 600 ~/.kaggle/kaggle.json
# List available datasets
! kaggle datasets list
# Download the IMDb movie reviews dataset
! kaggle datasets download -d lakshmi25npathi/imdb-dataset-of-50k-movie-reviews
Exploratory Data Analysis (EDA)
Before the opening credits roll, we dive into exploratory data analysis, gaining insights into the dataset’s structure and sentiments. Let’s take a closer look at the components of our dataset and set the stage for our analysis.
# Import necessary libraries
import pandas as pd
# Load the dataset
reviews_df = pd.read_csv('/content/imdb-dataset-of-50k-movie-reviews.zip')
# Display the first few rows of the dataset
reviews_df.head()
Counter(reviews_df["sentiment"])
review | sentiment | |
---|---|---|
0 | One of the other reviewers has mentioned that … | positive |
1 | A wonderful little production The… | positive |
2 | I thought this was a wonderful way to spend ti… | positive |
3 | Basically there’s a family where a little boy … | negative |
4 | Petter Mattei’s “Love in the Time of Money” is… | positive |
Counter({‘positive’: 25000, ’negative’: 25000})
Training and Testing our Sentiment Classifier
The plot thickens as we enter the realm of model training. With the data preprocessed and features extracted, we train a powerful RandomForestClassifier to predict sentiments in movie reviews.
# Import necessary libraries
from sklearn.model_selection import train_test_split
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import classification_report
from joblib import dump
# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(
reviews_df["review"].values, reviews_df["sentiment"].values,
stratify=reviews_df["sentiment"].values, test_size=0.2, random_state=42
)
# Initialize TfidfVectorizer
tfidf_vectorizer = TfidfVectorizer(max_features=100_000, stop_words="english")
# Fit and transform on training data
X_train_vect = tfidf_vectorizer.fit_transform(X_train)
# Transform on testing data
X_test_vect = tfidf_vectorizer.transform(X_test)
# Initialize RandomForestClassifier
clf = RandomForestClassifier()
# Fit and predict using the classifier
clf.fit(X_train_vect, y_train)
y_pred = clf.predict(X_test_vect)
# Generate and print the classification report
class_report = classification_report(y_test, y_pred)
print(class_report)
precision recall f1-score support
negative 0.84 0.86 0.85 5000
positive 0.85 0.84 0.85 5000
accuracy 0.85 10000
macro avg 0.85 0.85 0.85 10000
weighted avg 0.85 0.85 0.85 10000
Peering into the Mind of the Model: Interpretation of Predictions
Ever wonder how our model arrives at its predictions? We employ the LIME library to shed light on the inner workings of the model, explaining its decisions on individual reviews.
# Import necessary libraries
from lime import lime_text
import random
# Initialize LimeTextExplainer
explainer = lime_text.LimeTextExplainer(class_names=clf.classes_)
# Define prediction function
def pred_FN(text):
text_vectorised = tfidf_vectorizer.transform(text)
return clf.predict_proba(text_vectorised)
# Select a random review for explanation
idx = random.randint(1, len(X_test))
review = X_test[idx]
# Explain the model's prediction
explanation = explainer.explain_instance(review, classifier_fn=pred_FN, num_features=50)
# Display original text, prediction, and explanation
print("Actual text: ", review)
print("Prediction: ", clf.predict(X_test_vect[idx].reshape(1, -1))[0])
print("Actual: ", y_test[idx])
Avctual text : Ken Loach showed the world the down-and-out flip side of Swinging London with "Poor Cow", about London woman Joy (Carol White) hooking up with a thief and having a son with him, only to see the man end up in the slammer. While his friend (Terence Stamp) manages to help her out some, he proves to be little better in what a loser he is. It soon becomes clear to Joy that she's going to have to make a serious decision about where she's going in her life.<br /><br />One thing that I determined - I don't know whether or not this is accurate - was a use of irony in the movie. Her name is Joy, but she experiences no joy in her life. Even if that wasn't intended, it's still a movie that I recommend to everyone. Featuring songs by Donovan (one of which - "Colors" - appeared in another Terence Stamp movie: "The Limey" (which, incidentally, came out in 1999, when I was as old as my parents were when "Poor Cow" came out)).
Prediction : negative
Avctual : positive
Saving the Model for Future Movie Review Adventures
Our model is more than just a one-hit wonder. We save the trained model and vectorizer for future cinematic analysis, ensuring our AI companion is ready for any review that comes its way.
# Save the trained model and vectorizer
dump(tfidf_vectorizer, "vectorizer.joblib")
dump(clf, "classifier.joblib")
Conclusion
And that’s a wrap! Our sentiment analysis project has taken us from loading data to crafting an interactive user interface. By leveraging machine learning techniques, we’ve unveiled the sentiments behind movie reviews, offering a glimpse into the emotional world of cinema.
FAQs
Q1: How accurate is the sentiment analysis model? A: The model’s accuracy varies based on the data and features used. However, we strive for high accuracy and continually fine-tune the model to improve its performance.
Q2: Can I use this model for other types of text analysis? A: While this model is tailored for movie review sentiment analysis, it can be adapted for other text classification tasks with appropriate adjustments.
Q3: How frequently will the model be updated? A: We are committed to keeping the model up-to-date and enhancing its capabilities as new data and techniques become available.
Q4: Can I contribute to this project? A: We appreciate your enthusiasm! Please reach out to us at bhirimehdi28@gmailcom to discuss potential collaborations.
Q5: What’s the significance of the user interface? A: The user interface adds an interactive element to the project, allowing users to experience sentiment analysis firsthand and gain insights into movie reviews.
Access Now:
Unlock the cinematic secrets within movie reviews by accessing our sentiment analysis project. Dive into the world of sentiment classification and explore the emotions woven into each review.
And there you have it! Our journey through movie review sentiment analysis has reached its climax. We hope you’ve enjoyed this rollercoaster ride through data, analysis, and AI-powered insights. Until next time, happy movie watching!