Get git blame for full Github repository via api

Get git blame for full Github repository via api


2

I’d like to get the output of git blame <file> for all files in a repository recursively.

I want to do this without cloning the repository first, by using Github’s GraphQL v4 api.

Is this possible?

I’ve managed to get a list of files via this query:

query {
  repository(owner: "some owner", name: "some repository") {
    object(expression: "HEAD:") {
      ... on Tree {
        entries {
          name
          type
          mode
          
          object {
            ... on Blob {
              byteSize
              text
              isBinary
            }
          }
        }
      }
    }
  }
}

as well as a single file’s git blame via this query:

query {
  repositoryOwner(login: "some owner") {
      repository(name: "some repo") {
      object(expression: "some branch") {
        ... on Commit {
          blame(path: "some/file/path.js") {
            ranges {
              startingLine
              endingLine
              age
              commit {
                oid
                author {
                  name
                }
              }
            }
          }
        }
      }
    }
  }
}

Is it possible to combine these queries into one?

If not, it probably makes sense to clone the repo first and run git blame recursively locally, right?

1 Answer
1


0

I managed to do it like this in my own project:

https://github.com/ricardobranco777/bugme/blob/master/gitblame.py

import requests
import json

access_token = "ghp_xxx"

owner = "ricardobranco777"
repo = "bugme"
branch = "master"
file = "README.md"

query = """
query($owner: String!, $repositoryName: String!, $branchName: String!, $filePath: String!) {
  repositoryOwner(login: $owner) {
    repository(name: $repositoryName) {
      object(expression: $branchName) {
        ... on Commit {
          blame(path: $filePath) {
            ranges {
              startingLine
              endingLine
              commit {
                # id
                # commitUrl
                # commitResourcePath
                committedDate
                # message
                # messageBody
                # url
                # authoredDate
                oid
                author {
                  name
                  email
                }
              }
            }
          }
        }
      }
    }
  }
}
"""

# Define query variables
variables = {
    "owner": "os-autoinst",  # Replace with your desired repository owner
    "repositoryName": "os-autoinst-distri-opensuse",  # Replace with your desired repository name
    "branchName": "master",  # Replace with your desired branch name
    "filePath": "README.md",  # Replace with your desired file path
}

# Set up the headers with your access token
headers = {
    "Authorization": f"Bearer {access_token}",
}

# Define the API endpoint
url = "https://api.github.com/graphql"

# Make the GraphQL request
response = requests.post(url, headers=headers, json={"query": query, "variables": variables})

# Parse the JSON response
data = response.json()
from pprint import pprint
pprint(data)



Leave a Reply

Your email address will not be published. Required fields are marked *