Here are five jokes for you:
-
Why did the scarecrow win an award?
Because he was outstanding in his field!
-
Why don’t skeletons fight each other?
They don’t have the guts!
-
Why did the math book look sad?
It had too many problems.
-
Why don’t eggs tell jokes?
Because they might crack up!
-
What did one ocean say to the other ocean?
Nothing, they just waved.
Now, let's check how many of these jokes are unique. I'll analyze them for similarity.
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity
List of jokes
jokes = [
"Why did the scarecrow win an award? Because he was outstanding in his field!",
"Why don’t skeletons fight each other? They don’t have the guts!",
"Why did the math book look sad? It had too many problems.",
"Why don’t eggs tell jokes? Because they might crack up!",
"What did one ocean say to the other ocean? Nothing, they just waved."
]
Vectorize the jokes
vectorizer = TfidfVectorizer().fit_transform(jokes)
similarity_matrix = cosine_similarity(vectorizer)
Count unique jokes (jokes with low similarity to others)
unique_jokes = set()
threshold = 0.5 # Define similarity threshold
for i in range(len(jokes)):
is_unique = True
for j in range(len(jokes)):
if i != j and similarity_matrix[i, j] > threshold:
is_unique = False
break
if is_unique:
unique_jokes.add(jokes[i])
Count unique jokes
num_unique_jokes = len(unique_jokes)
num_unique_jokes
All 5 jokes are unique! None of them are too similar to each other. 🎉