5

Background info

As explained here, StackOverflow questions can be closed due to several reasons, namely Duplicate of..., Off-topic because..., Needs details or clarity, Needs more focus and Opinion-based.

Queries are executed on the public StackOverflow Bigquery on Google Cloud Platform. This Bigquery contains, amongst other tables, posts_questions and votes, the first containing all questions and the second contains cast votes on these questions.

posts_questions schema:

id title body accepted_answer_id answer_count comment_count ...

vote Schema:

id creation_date post_id vote_type_id

Question

There exist 16 different vote_type_ids and according to this post on Meta, vote_type_id 6 corresponds to a close vote. After three close votes were cast by users, a question appears as closed on StackOverflow. The following query, therefore, returns the id and URL of 10 closed questions.

SELECT q.id, CONCAT('https://stackoverflow.com/questions/', CAST(q.id as STRING)) as url,
FROM `bigquery-public-data.stackoverflow.posts_questions` AS q
JOIN `bigquery-public-data.stackoverflow.votes` AS v
  ON q.id = v.post_id
WHERE v.vote_type_id = 6
GROUP BY q.id
HAVING COUNT(*) >= 3
LIMIT 10

My question is, is it possible to query closed questions based on the close vote reason? Like, query 250 questions that were closed due to being duplicates?

mrivanlima
  • 649
  • 4
  • 9
Manuel Fuchs
  • 320
  • 3
  • 9

1 Answers1

2

Like, [howto] query questions that where closed due to being duplicates?

You should use PostHistory table

PostHistoryTypeId

  • 10 = Post Closed - post voted to be closed

Comment: This field will contain the comment made by the user who edited a post. If PostHistoryTypeId = 10, this field contains the CloseReasonId of the close reason

  • 1 = Exact Duplicate
  • 101 = Duplicate

So, finally - the query is

SELECT q.id, CONCAT('https://stackoverflow.com/questions/', CAST(q.id as STRING)) as url,
FROM `bigquery-public-data.stackoverflow.posts_questions` AS q
JOIN `bigquery-public-data.stackoverflow.post_history` AS h
  ON q.id = h.post_id
WHERE h.post_history_type_id = 10
AND h.comment in ('1', '101')
GROUP BY q.id   

My question is, is it possible to query closed questions based on the close vote reason?

SELECT CASE 
    WHEN comment IN ('1', '101') THEN 'Duplicate'
    WHEN comment = '102' THEN 'Off-topic'
    WHEN comment = '3' THEN 'Subjective and argumentative'
    WHEN comment = '4' THEN 'Not a real question'
    WHEN comment = '7' THEN 'Too localized'
    WHEN comment = '10' THEN 'General reference'
    WHEN comment = '20' THEN 'Noise or pointless (Meta sites only)'
    WHEN comment = '103' THEN '''Unclear what you're asking'''
    WHEN comment = '104' THEN 'Too broad'
    WHEN comment = '105' THEN 'Primarily opinion-based'
    ELSE 'Unknown'
  END close_reason,
  COUNT(DISTINCT q.id) cloased_posts
FROM `bigquery-public-data.stackoverflow.posts_questions` AS q
JOIN `bigquery-public-data.stackoverflow.post_history` AS h
  ON q.id = h.post_id
WHERE h.post_history_type_id = 10
GROUP BY close_reason
ORDER BY cloased_posts DESC    

with output

enter image description here

Mikhail Berlyant
  • 117,385
  • 6
  • 77
  • 139