I’m a backend (.NET) guy, trying to do some sneaky web-scrapping from a site based on either nextjs, or react, or both of them together..? (I really don’t understand much about all this js-zoo diversity, and my head begins to hurt when I’m trying to) *cough* — and I happened to encounter an endpoint that returns data in format like this:
2:I[50124,["9218","static/chunks/2d45e2a3-5c057d8e0f33767e.js","8969","static/chunks/8969-680493329abfb0c4.js","202","static/chunks/202-de118e3812c53ed3.js"],"Page"]
0:["F8KN2c1tQ8aMh140dU9Q6",[["children",["lang","en","d"],"children","__PAGE__?{"some_json":"here"}",["__PAGE__",{},[["$L1",["$","$L2",null]],null],null],[null,"$L6"]]]]
1:null
5:{"AND HERE'S THE MAIN JSON OBJECT": "THAT EXPECTED TO BE RETURNED"}
Some nodes in “main json object” have structure like { "name": "$42" }
, and if there’s a line above with a key like 42:I ..etc
, browser will understand it and put together, but my scrapper will not.
Response header reports Content-Type with text/x-component value. Also, request headers require some “RSC” header to bet set with “1”, otherwise response will contain the whole html page instead of what’s above.
Where can I read something about it? How do you even call it? Is there any way to handle it or force API to return regular json? I don’t like the idea that I’ll have to put these strings into a normal json object manually…