Scraping GraphQL Cloudflare protected site

Scraping GraphQL Cloudflare protected site


-1

I got the following website: https://sportsbet.io. This website makes some XHR (POST) requests to https://sportsbet.io/graphql. This data is retrieved properly when I navigate the webpage using a browsers.

However I wish to replicate this behaviour in either Postman or Python requests (httpx) but both seem to return me a 403 Forbidden. I wrote the following code for that:

import httpx

cookies = {
    'cf_clearance': '7w3TljqDiZYb_jzXp0E5Hdrb5xdMD8zpObv2TSZyjIQ-1692539974-0-1-248f99eb.e5bf9145.738ec60b-0.2.1692539974',
    'GW_CLIENT_ID': 'f77c94f4f182008acb37f79e7e47fc27108c3cec483b0f978e239eb78d9b817f',
    'refAff': 'affid=2419&cxid=2419_787089&source=834dde61122',
    'experiments': '%7B%22ydugo4GHROOp8qOy9fuAXg%22%3A3%7D',
    'tryMetamaskHide': 'true',
    'MKTSRC': '{"t":1692542747808,"d":{"src":"3.214.196.62","mdm":"referral","cmp":"","kwd":"","cnt":"","glc":"","msc":""}}',
    '__cf_bm': 'AIV0CL8R8sVDc8b9FPyvvAFgTv5Z3WfIEajp8uWmX80-1692543551-0-AY8l9ARjw2Ke5mUcwvHVZPlNFgw+tzCYqphk2suvRirGlTjB04cNNNO4VY7fkHhaNvumalzEH2G41p3pLYo+B+c=',
    'userPreferenceId': 'U3BvcnRzYmV0UHJlZmVyZW5jZXNVc2VyUHJlZmVyZW5jZTo2NGUyMWM0NGFjOTI4YTVhODcwYjg5M2Q=',
}

headers = {
    'authority': 'sportsbet.io',
    'accept': '*/*',
    'accept-language': 'en-GB,en-US;q=0.9,en;q=0.8',
    'authorization': '',
    'content-type': 'application/json',
    # 'cookie': 'cf_clearance=7w3TljqDiZYb_jzXp0E5Hdrb5xdMD8zpObv2TSZyjIQ-1692539974-0-1-248f99eb.e5bf9145.738ec60b-0.2.1692539974; GW_CLIENT_ID=f77c94f4f182008acb37f79e7e47fc27108c3cec483b0f978e239eb78d9b817f; refAff=affid=2419&cxid=2419_787089&source=834dde61122; experiments=%7B%22ydugo4GHROOp8qOy9fuAXg%22%3A3%7D; tryMetamaskHide=true; MKTSRC={"t":1692542747808,"d":{"src":"3.214.196.62","mdm":"referral","cmp":"","kwd":"","cnt":"","glc":"","msc":""}}; __cf_bm=AIV0CL8R8sVDc8b9FPyvvAFgTv5Z3WfIEajp8uWmX80-1692543551-0-AY8l9ARjw2Ke5mUcwvHVZPlNFgw+tzCYqphk2suvRirGlTjB04cNNNO4VY7fkHhaNvumalzEH2G41p3pLYo+B+c=; userPreferenceId=U3BvcnRzYmV0UHJlZmVyZW5jZXNVc2VyUHJlZmVyZW5jZTo2NGUyMWM0NGFjOTI4YTVhODcwYjg5M2Q=',
    'origin': 'https://sportsbet.io',
    'referer': 'https://sportsbet.io/sports',
    'sec-ch-ua': '"Chromium";v="116", "Not)A;Brand";v="24", "Google Chrome";v="116"',
    'sec-ch-ua-mobile': '?0',
    'sec-ch-ua-platform': '"macOS"',
    'sec-fetch-dest': 'empty',
    'sec-fetch-mode': 'cors',
    'sec-fetch-site': 'same-origin',
    'user-agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/116.0.0.0 Safari/537.36',
}

json_data = {
    'operationName': 'AutomaticFeaturedEventsQuery',
    'variables': {
        'language': 'en',
        'site': 'sportsbet',
        'cricketIncluded': True,
    },
    'query': 'query AutomaticFeaturedEventsQuery($language: String!, $cricketIncluded: Boolean!) {n  sportsbetNewGraphql {n    idn    region {n      idn      automaticFeaturedEvents {n        ...DesktopEuropeanEventFragmentn        __typenamen      }n      __typenamen    }n    __typenamen  }n}nnfragment DesktopEuropeanEventFragment on SportsbetNewGraphqlEvent {n  idn  __typenamen  asian {n    idn    __typenamen    ftMatchWinner {n      ...EventListMarketQueryFragmentn      __typenamen    }n    ftTotal {n      ...EventListMarketQueryFragmentn      __typenamen    }n    ftHandicap {n      ...EventListMarketQueryFragmentn      __typenamen    }n  }n  ...EventListInformationQueryFragmentn}nnfragment EventListMarketQueryFragment on SportsbetNewGraphqlMarket {n  idn  __typenamen  enName: name(language: "en")n  name(language: "en")n  statusn  specifiersn  selections {n    idn    __typenamen    enName: name(language: "en")n    name(language: $language)n    activen    oddsn    providerProductIdn    competitorTypen  }n  market_type {n    idn    __typenamen    namen    descriptionn    translation_keyn    typen    settings {n      idn      betBoostMultipliern      __typenamen    }n  }n}nnfragment EventListInformationQueryFragment on SportsbetNewGraphqlEvent {n  idn  __typenamen  typen  statusn  start_timen  market_countn  live_oddsn  slugn  name(language: $language)n  enName: name(language: "en")n  maxBetAvailablen  videoStream {n    idn    __typenamen    streamAvailablen  }n  sport {n    idn    __typenamen    slugn    name(language: $language)n    betBoostMultipliern    iconCoden  }n  league {n    idn    __typenamen    slugn    name(language: $language)n    betBoostMultipliern  }n  tournament {n    idn    __typenamen    slugn    name(language: $language)n    betBoostMultipliern  }n  competitors {n    idn    __typenamen    name(language: $language)n    typen    betradarIdn  }n  information {n    idn    __typenamen    match_timen    provider_prefixn    period_scores {n      idn      __typenamen      home_scoren      away_scoren    }n    match_status_translations(language: $language)n    home_scoren    away_scoren    home_gamescoren    away_gamescoren    provider_product_idn  }n  premiumCricketScoringData @include(if: $cricketIncluded) {n    ...CricketStatsFragmentn    __typenamen  }n  isSportcastFixtureActiven  sportcastFixtureIdn}nnfragment CricketStatsFragment on SportsbetNewGraphqlPremiumCricketScore {n  idn  matchTitlen  matchCommentaryn  battingTeam {n    idn    teamWicketsn    teamRunsn    teamOversn    teamNamen    sixesn    foursn    extrasn    competitorIdn    __typenamen  }n  previousInnings {n    idn    wicketsn    teamNamen    summaryn    runsn    oversAvailablen    oversn    inningsNumbern    conclusionn    competitorIdn    __typenamen  }n  overs {n    idn    runsn    overNumbern    isCurrentOvern    ballsn    __typenamen  }n  batsmen {n    sixesn    runsn    onStriken    foursn    batsmanNamen    ballsn    activen    __typenamen  }n  __typenamen}n',
}

response = httpx.post('https://sportsbet.io/graphql', cookies=cookies, headers=headers, json=json_data)

Looking at the content of the response variable it shows some HTML content containing for example <title>Just a moment...</title>. Furthermore the __cf_bm value is rotated by the website every 30 minutes.

Anyone can help me out why I receive a 403 Forbidden when I replicate the exact same request?

Share
Improve this question

2

  • That is the point of CloudFlare.

    – luk2302

    31 mins ago

  • @luk2302 Google reCaptcha and other services have the same purpose and are bypassable hence the question.

    – Sander Bakker

    28 mins ago


Load 4 more related questions


Show fewer related questions

0

Reset to default



Browse other questions tagged

or ask your own question.

Leave a Reply

Your email address will not be published. Required fields are marked *