My goal is to pull an X number of posts with their respective comments from Reddit and join them in a way that each post contains an array with its comments underneath.
Here’s what I’ve done.
Process:
- I’m pulling the posts from Reddit (with post_name being the main identifier)
- I’m pulling the comments for these posts from Reddit (with the parent_ID and link_ID) in this case the connection with the post.
- I’m running this through a merge process - with combine by matching fields as the operation. As input 1 I have posts (identified by post_name). As input 2, I have comments (identified by parent_id).
- This generates merged out, but the result is not exactly how I want it.
Result:
- While the posts and comments are linked up correctly, the json output repeats the post for each linked comment. In other words, like this:
[
{
"post_id":
"t3_1hh8d2j",
"post_name":
"t3_1hh8d2j",
"post_title":
"Test Post",
"post_content":
"Eliminating unnecessary manual tasks has always been the goal of automation. While Robotic Process Automation (RPA) has been a dominant method for decades, it falls short when dealing with complex, unstructured processes. Artificial Intelligence, especially Generative AI, offers a more versatile approach by enabling automation to handle tasks that require judgment, reasoning, and adaptability.",
"post_likes":
1,
"post_comments":
4,
"post_subreddit":
"Cadence_Dev",
"comment_id":
"m2p4kmm",
"comment_name":
"t1_m2p4kmm",
"comment_content":
"This is a test 1st level comment (number 1)",
"direct_parent":
"t3_1hh8d2j"
},
{
"post_id":
"t3_1hh8d2j",
"post_name":
"t3_1hh8d2j",
"post_title":
"Test Post",
"post_content":
"Eliminating unnecessary manual tasks has always been the goal of automation. While Robotic Process Automation (RPA) has been a dominant method for decades, it falls short when dealing with complex, unstructured processes. Artificial Intelligence, especially Generative AI, offers a more versatile approach by enabling automation to handle tasks that require judgment, reasoning, and adaptability.",
"post_likes":
1,
"post_comments":
4,
"post_subreddit":
"Cadence_Dev",
"comment_id":
"m2uv29u",
"comment_name":
"t1_m2uv29u",
"comment_content":
"This is a test 1st level comment (number 2)",
"direct_parent":
"t3_1hh8d2j"
},
{
"post_id":
"t3_1hh8d2j",
"post_name":
"t3_1hh8d2j",
"post_title":
"Test Post",
"post_content":
"Eliminating unnecessary manual tasks has always been the goal of automation. While Robotic Process Automation (RPA) has been a dominant method for decades, it falls short when dealing with complex, unstructured processes. Artificial Intelligence, especially Generative AI, offers a more versatile approach by enabling automation to handle tasks that require judgment, reasoning, and adaptability.",
"post_likes":
1,
"post_comments":
4,
"post_subreddit":
"Cadence_Dev",
"comment_id":
"m2uz01t",
"comment_name":
"t1_m2uz01t",
"comment_content":
"This is a test 1st level (number 3)",
"direct_parent":
"t3_1hh8d2j"
}
]
Clearly, what I want to accomplish is a structure like:
Post 1
Comments:
- comment 1
- comment 2
Post 2
- comment 3
- comment 4
- etc
My flow looks like this at the moment. ChatGPT is useless. Does anyone have the right answer for this?
Thanks!