• March 5, 2022

How do I backup GitHub & why do I need to?

You’ve just launched a brand-new application, utilizing GitHub as your Git repository. All your hard work, planning, time, and effort has finally paid off. The first month has been incredible, and you have gained many new customers. Everything is going as planned…

Then, one Monday morning, a member of your team informs you that your GitHub data has been deleted somehow. Completely disappeared.

In all the excitement of launching your revolutionary new app, you didn’t back up your critical development assets. Now, the code that you have spent thousands of hours (and money) working on has gone.

But surely GitHub itself is a backup so backs up my data automatically?

Kind of… GitHub stores the most recent code and files, not all the versions and edits that have gone before it. It’s a bit like an autosave feature without the history.

Data loss can happen at any time and there are several potential causes, some of the most common include:

  • Human error – Users make mistakes and accidental deletion of repositories is the most common reason for data loss.
  • Nefarious users – Disgruntled employees who still have access to your system could wreak havoc and maliciously delete your data.
  • Platform issues – No cloud app can guarantee 100% uptime, and we’ve even seen cloud providers accidentally lose user data.
  • Outages – In 2017 and again in 2020 there was a major outage of the GitHub service leaving many companies without access to their code and the possibility to work.
  • Ransomware and viruses – Unfortunately, this risk is becoming more common and is now seen on cloud apps, including GitHub, not just desktop computers.

Why is it important to back up your GitHub data?

Losing your data could be catastrophic for your business. Think about your company
– how long would you be able to work without access to your GitHub data? What have you got to lose? What would be the financial fallout to your company having no access? Could you afford it? Or are you better to prevent such situations and invest in a reliable third-party backup (with us)?

So, how do you backup GitHub?

Essentially, there are two methods for backing up your GitHub repositories. You can do it yourself (in-house) or work with a backup provider (third-party) to do it for you. We often say to our customers, your data is always your responsibility – you need to make sure it’s properly protected – and it is with us!

In-house backup

Taking things in-house gives you complete control in backing up your GitHub repositories. You can decide how the system is managed, how it integrates with other parts of your business, what can be backed up, and how often you can backup.

There are two popular open-source scripts you can install to backup your own data: Python GitHub Backup script and Amazon S3 Backup GitHub Action.

These open-source scripts are easy to implement, and the code is stable, widely used, and free. After installing the scripts, they allow you to choose which repositories you want to back up and specify which resources within those repositories. You can then choose where to send the copy of the data.

However, there are limitations to these free scripts. To back up, you must run them manually, because there is no automatic backup at a regular set time. But one of the biggest concerns is that there is no option to encrypt your backups. This lack of security will potentially leave you open to attackers who could steal and exploit your code. If you are subject to any kind of regulations such as HIPPA or GDPR you may need to show you have a separate backup of your data. Assuming that GitHub has a backup isn’t good enough – they will ask you to prove it.

The other thing you will need to consider when managing your own backups is manpower. Unless you are willing to take on the task of backing up yourself, you will require a dedicated individual or team to complete the backup and manage it. In reality, a backup script is setup once, tested for a week or so, then left. Unfortunately, these types of scripts are barely tested to see if they restore. If you are running a manual GitHub backup script, when was the last time you tested that actually restores successfully?

Third-party backup

If you’d prefer to work with a third-party organisation that will take care of your backup, then look no further than BackupLABS. Our backup-as-a-service platform automatically backs up your GitHub repositories, and metadata (including milestones, projects, pull requests, wiki, releases), to ensure you never lose access to your critical code.

Installing and configuring our solution is easy thanks to GitHub Marketplace integration, and our free trial allows you to try before you buy.

Once installed, we will automatically back up your repositories each day. Backups are encrypted during transfer and at rest and are stored securely on our servers with 256 bit AES encryption, with the option to save them on your own Amazon S3 bucket if you wish.

When you need to use your repository backup, you can either restore it directly into GitHub or restore the repository from BackupLABS onto your local machine. This ensures that in the event of an outage, you can access your data whenever you need to. We also provide you with an audit log so you can keep track of your backups for compliance and security purposes. We know that recovery is an essential part of backing up your data, so our solution enables you to recover your repository at any point in the past 30 days, and more if required.

The benefits of backing up GitHub

  • Peace of mind. If data is compromised, backups are extremely helpful in providing a previously good state to compare with the current state. If you only have your infected codebase, it can be challenging to uncover all the corrupted files or backdoors left open for future attacks.
  • Saves money. Ransomware attacks take control of codebases or sets of infrastructure and demand money in return to unlock the encryption they have added. Having backup of your repository would mean you could restore from your backup and continue working with minimal downtime – not to mention saving your business £££’s as the ransom would not need to be paid, as unlocking the encryption would not be required.
  • Saves time. GitHub has a 90-day recovery period for deleted repositories – but there are limitations to this feature. Having a backup and restore plan (with frequent automatic backups in place) with a third party, ensures a minimal negative impact to your business and a quick recovery to your GitHub repositories – it’s the difference between a work-stopping outage causing days out or even having to close your business, to a simple alert and switchover.
  • Maintaining compliance. Our backup solution ensures that personal data is compliant with GDPR. If you’re subject to SOC 2 compliance standards, you’ll choose from 5 Trust Services Criteria categories, Security (or Common Criteria), Availability, Confidentiality, Processing Integrity and Privacy. SOC 2 auditors will check that backups of certain database components and applications are performed daily to support recovery in the event of a service failure. The SOC 2 also demonstrates that you take security seriously and have invested in processes and systems that will protect sensitive data and information – it’s a great competitor advantage too.

If you need a reliable and robust backup partner, head our GitHub Backup page to get started. You’ll be up and running within 5 minutes!

Let us take care of your data while you take care of the rest of your business.