Skip to content

Google Code Archive #33

@PhungVanDuy

Description

@PhungVanDuy

Title

Dataset URL - here

Does the dataset exist in a scraped format?
No

Description

The Google Code Archive contains the data found on the Google Code Project Hosting Service, which was turned down in early 2016.

This archive contains over 1.4 million projects, 1.5 million downloads, and 12.6 million issues. You can learn more about the data served from Google Cloud Storage here.

Google Code offered open-source project hosting on other domains besides just code.google.com, too.

Procedure

Tests

Include a dummy_dataset.parquet file to test your code against. This dummy_dataset should include the columns for the data and metadata associated with the dataset, which will then be converted into the final format for language model consumption, along with an example row or rows that you can verify your code correctly collects. In addition to this file, include the unit test that evaluates your code against this dummy_dataset.

Give an example of the columns and data:

col1 col2 ....
row1 row1 ....

Metadata

Metadata

Assignees

No one assigned

    Labels

    dataset-requestRequest for addition of new dataset

    Type

    No type

    Projects

    Status

    Todo

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions