How to use
The interface of Cosmograph web application is quite minimal. When you open it, you'll find hints and data examples right away. Let's go over the basics here, so you don't get lost the first time you run it.
Choosing mode
Cosmograph supports two modes of data visualization: Graph and Embedding.
Graph mode supports the visualization of data as a network graph with nodes and links. It includes simulation capabilities to dynamically position nodes based on the link structure.
Embedding mode is designed for exploring machine learning datasets where the nodes already have predefined coordinates. It does not support links between nodes and does not have simulation capabilities. This mode allows you to quickly visualize and analyze the structure of an embedding dataset.
The main difference is that Embedding mode expects the data to already contain coordinates for node positioning, while Graph mode can generate coordinates by simulating node positions based on the provided graph topology.
Loading data
Cosmograph requires data to be in .csv, .tsv, or .ssv file formats. It automatically detects numeric and color columns, regardless of their column name. Time series data is parsed only from columns named time or date.
When using CSV-like formats, be aware that inconsistent spacing in node ids can create duplicate nodes. For example, "node1", "node1" and "node1" would be treated as different nodes. To avoid this, maintain consistent spacing in the columns containing node ids.
Graph mode
Graph mode requires a file that has at least two columns, one for source nodes and another for target nodes. Such a graph representation is usually called an edge list, since each row represents a connection between two nodes. Create an edge list by addding node ids in the source and target columns of each row to establish links between nodes in a graph. for example:
source;target
node1;node2
node1;node3
That's it! This is all you need to draw your graph. But let's say you're working with transactions, and node1 has sent something to node2. Let's add a few extra columns to your data file, one for the time of the transaction and another for its value:
time;source;target;value
2/4/2022;node1;node2;2
2/5/2022;node1;node3;10
Cosmograph will automatically detect the time column and display a timeline at the bottom. And any numeric column like value will be available as options for link color and thickness, which will be generated by pre-calculated scales based on all values of that column. You can also add a column containing color values that will be applied directly to the corresponding links. Colors can be specified in any popular color format, such as RGB or Hex.
Columns specified as source or target are only parsed as node ids. This means that if these columns contain numeric or color data, it will not be applied to the graph.
Metadata file
The edge list data file only contains information about the connections in your graph. What if you also have information about the nodes and want to use it in your visualization? You can do that by providing a metadata file where each row corresponds to a specific node by it's id:
id;color;size
node1;red;10
node2;green;20
node3;blue;30
If you do this, you can use this information to set the color and size of the nodes in the user interface.
Metadata supports parsing plain text columns and will generate an ordinal color scale palette, which is useful if you have data that categorizes nodes. Let's create a column type for that. And you can also add a time column for nodes!
id;color;size;type;time
node1;red;10;bird;2/4/2022
node2;green;20;plant;2/5/2022
node3;blue;30;bird;2/6/2022
After selecting your data and optional metadata files on the Cosmograph launch screen, you can simply click Launch button and enjoy watching your graph render in real time.
If both the data and metadata have time columns, you can control the use nodes time data switch in General > Timeline. This allows you to decide which data Cosmograph timeline should display and operate on.
Embedding mode
To run Cosmograph in embedding mode, all you need is a file containing the nodes with their x and y coordinates.
Here is an example of what the file should contain:
id;x;y
node1;0.2;0.5
node2;0.7;0.3
node3;0.4;0.8
Embedding data can include additional columns with colors, text, numeric values and timestamps for nodes, as well as metadata for Graph mode.
Technical limitations
The technique we've developed to make the layout algorithm work on the GPU and be blazingly fast does, of course, have limitations. One of them is the force layout simulation space size limit. Our algorithm runs on a square grid. It's like a giant chessboard, and if you have multiple nodes trying to fit into a square, there will be computational artifacts that make the layout more noisy. You can choose your space size before you start the simulation. The smaller it is, the faster the visualization works. However, large graphs may not fit into smaller spaces. If your graph has several million nodes, they may not fit at all, even if you choose the largest available space size.
Space size can be only set in the Launcher screen of Cosmograph. This setting is cached, and the next time you open an app, the last value will be automatically used. The default size is 4096.
Apple devices
Starting from iOS version 15.4, Apple has stopped supporting the key WebGL EXT_float_blend
extension used by our graph layout algorithm. As a result, the forces in the graph behave incorrectly and cause the nodes to center. This issue also affects older versions of Safari on laptops, with problem being reported in version 14.1. We hope to find a way to solve this problem in the future, but there's not much we can do about it right now.
Controlling the look and feel
Simulation
There are a handful of force layout parameters that can be changed while the simulation is running. They can be found in the Simulation section of the General tab. You'll be able to control gravity, repulsion, friction, bond strength, and other force layout properties. We encourage you to play with the sliders to get a better layout for your graph, and it's also quite fun to see the graph change before your eyes.
Use the Pause/Play, and Acceleration buttons in the upper right corner of Cosmograph to control the simulation. Acceleration gives the simulation a boost. If you are working with a graph that has many tangled links, using Acceleration can help untangle them faster.
Visual appearance
You can change the appearance of the graph in the Node Appearance and Link Appearance sections of the General tab. In particular, you can choose which data and metadata columns are used to define the size and color of the nodes and the width and color of the links. There are Legends available for both node and link appearance.
Exploring the graph
Once the force graph simulation has slowed down and reached equilibrium, you can explore the graph by clicking on any node to see its name and other nodes to which it's connected. Alternatively, you can go to the Info tab and search for a specific node by name.
If a node has metadata or data loaded in the Embedding mode, fields from the selected node's data will be displayed in the Info tab.
Timeline
If your graph contains time data, you can select an area on the timeline and press play to see the animation. It comes in handy when you're exploring transactions and want to see which parts of the graph were more or less active during certain time periods.
You can hide and control source of timeline data in General > Timeline section.
Histograms
In the Analysis tab of Cosmograph you can check out histograms to see how your numerical data is distributed. These histograms are created based on the numeric columns in both the data and the metadata, or the embedded data if you loaded it in Embedding mode. By looking at these histograms, you can easily see the patterns and select specific value ranges for further analysis. This feature is very helpful when you want to understand the distribution patterns and characteristics of your data.
By default, histograms are limited to the range of values that contains the most data points. This is done using a quantile function, which ignores outliers and focuses the visualization on the most important data. If you want to use the full range of values for a histogram, you can disable the quantiled range switch.
Search
Explore your data by searching for specific values within fields. The search is case-insensitive, so you can easily find the value you want. In addition, Cosmograph allows you to select the specific field you want to search, giving you more control and flexibility when exploring your data. To use it, load the metadata, or if you're in Embedding mode, make sure your data has numeric or text columns other than just id. These columns will be accessible in a special menu that appears when you click on the current search field, located to the right of the search input.
You can select all nodes matching your current input by pressing Enter key in the search input.
Data table
Cosmograph provides a table view for selected data, displaying the raw rows from data source files. In Graph mode, you can easily toggle between the data and metadata files to see how the links in the data are connected to the selected nodes, or view the raw node records from the metadata. In addition, you can use the search box to filter the data for more focused analysis.
Selection
The Timeline, Analysis histograms, and Rectangular selection tools allow for flexible cross-selection of nodes. Selected nodes can be analyzed in detail, exported, or isolated from the rest of the graph for focused exploration.
Rectangular selection
You can also select a portion of the graph using the Rectangular Selection tool to view counter stats and export the underlying sub-graph data.
Isolation
Isolation allows you to focus on a subset of nodes in the graph by hiding all other nodes. This can be useful for analyzing a particular cluster or community in more detail.
To isolate nodes:
- Make a selection using one of the selection tools like Rectangular Selection or through the histograms. Any active selection activates the Isolate active selection button.
- Click the Isolate active selection button on the top toolbar. This will hide all nodes except the currently selected ones.
- The graph will reposition to show only the isolated nodes and links between them. Node and link stats will also update to reflect the isolated set.
- You can click Reset active isolation to undo the action and restore the full graph.
Some key points about isolation:
- Isolated nodes remain interactive. You can still run simulation, select, drag, and zoom into them.
- Links connected to hidden nodes are also hidden. So you only see the links between the isolated nodes.
- Selections and filters still apply on the isolated set. You can combine isolation with other tools.
- Isolated graphs can be exported to share or analyze only a part of the entire network.
Labels
Labels provide additional context and information for your graph or embedded visualization. You can control how labels appear on your graph in Cosmograph. If metadata or embedding data is provided with additional columns besides the id column, you can select one of these columns to display as node labels. This dropdown can be found in the General > Node Appearance.
By default, the labels will show the node ids obtained from the source and target columns of the data file or from the id column of the embedded data. Long labels display in full on node hover.
Legends
Legends help understand the displayed graph elements. They display according to the selected appearance settings. If default values of appearance settings are used, there will be no legend since defaults are constant.
Color legend
The color legend can be either bullet or gradient style. Bullet legends work for string (non-numeric and non-color) columns in the data. Bullet legends are interactive — clicking an item selects all nodes with that value. Gradient legends work for numeric columns in the data and are not interactive as bullet ones.
Size legend
The size legend works for numeric columns in the data and is not interactive.
Performance
If you're experiencing hardware issues with using Cosmograph, here are some tips to optimize the application's performance:
Adjusting canvas resolution
Adjust the General > Performance settings. Disabling the high resolution switch will render graph elements blurrier but may increase FPS slightly.
Disabling labels
Consider disabling labels in General > Node appearance using the show labels switch if performance is crucial. Lot of labels can notably decrease performance.
Managing links
Large numbers of links can slow down performance, especially when dealing with very large graph datasets. To address this, you can turn off links by using the show links option in the General > Link appearance settings. Options show link arrows and curved links are turned off by default, but enabling them can slightly decrease FPS.
Space size can have an impact
Try tweaking the space size. The default size works well for most devices and layouts, but for some graphs, increasing the space size up to 8192 can improve performance and decreasing. Experiment with this setting as it is sensitive to both hardware and graph layout.
Pausing simulation
If you're not interested in the simulation, you can pause it by clicking the Pause button on the top toolbar. This will stop the simulation and allow you to investigate the graph with higher FPS.
When the number of nodes exceeds 250000, high resolution and links rendering are automatically disabled due to the large size of the graph. However, you can enable them as needed in the General sidebar.
Export and share
Saving subsets
When you select nodes in the Cosmograph, you will find export options in the Info tab. The exported data is saved in .csv format.
In the Graph mode, you can export the links data by clicking the Records button and export the metadata of the selected nodes by clicking the Metadata button. Additionally, if you want to save the current layout of the selected nodes, you can export the x and y coordinates by enabling the related option.
In Embedding mode, you can only export the selected subset of nodes. Since there are no links or simulation in this mode, there is no need to export the layout as it remains static.
Sharing
Sharing features allow you to generate a URL that captures the current state of your graph visualization. By entering a data or embedding URL, you can create a link that loads that data into Cosmograph with the simulation and appearance settings you have configured. This makes it easy to share customized graph views with others. The generated URL is copied to the clipboard and can be distributed to share your graph exploration in an instant.
To share the entire state of your graph, export the datasets and metadata with layout coordinates and upload them to a cloud storage service. This allows you to create a link that includes not only the simulation and appearance settings, but also the node positions.
QueryString API
The QueryString API allows you to generate URLs that encode the state of the graph visualization. By appending query parameters to the Cosmograph URL, you can preconfigure options such as data source, color scheme, and other visual settings. When shared, these links open Cosmograph with the graph already customized based on the parameters. This makes it easy to share personalized graph views without requiring recipients to manually configure the options themselves.
To learn more about how this works in Cosmograph, read the relevant QueryString API documentation.
If Cosmograph shows the start screen instead of loading the data file from the URL, it is likely due to CORS restrictions. To resolve this issue, try hosting the data on a service that does not have CORS restrictions (like GitHub Gist) so that Cosmograph can successfully load it. Also, make sure your files have valid content because inappropriate files won't be parsed correctly and this will also cause Cosmograph to display the start screen.